0% found this document useful (0 votes)
12 views

Data Analytics for Accounting, 3rd Edition

Data Analytics for Accounting, Third Edition, authored by Vernon J. Richardson, Ryan A. Teeter, and Katie L. Terrell, emphasizes the importance of data analytics skills for accountants in a data-driven business environment. The book introduces the IMPACT cycle, a structured approach to data analytics, and provides hands-on practice with real-world datasets using tools like Microsoft Excel and Tableau. Key features include multi-track labs, auto-graded problems, and a focus on practical application to prepare students for future accounting roles.

Uploaded by

Hung Dau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Analytics for Accounting, 3rd Edition

Data Analytics for Accounting, Third Edition, authored by Vernon J. Richardson, Ryan A. Teeter, and Katie L. Terrell, emphasizes the importance of data analytics skills for accountants in a data-driven business environment. The book introduces the IMPACT cycle, a structured approach to data analytics, and provides hands-on practice with real-world datasets using tools like Microsoft Excel and Tableau. Key features include multi-track labs, auto-graded problems, and a focus on practical application to prepare students for future accounting roles.

Uploaded by

Hung Dau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 641

ISTUDY

Data Analytics for


Accounting
THIRD EDITION

Vernon J. Richardson
University of Arkansas, Baruch College

Ryan A. Teeter
University of Pittsburgh

Katie L. Terrell
University of Arkansas

ISTUDY
Final PDF to printer

DATA ANALYTICS FOR ACCOUNTING

Published by McGrawHill LLC, 1325 Avenue of the Americas, New York, NY 10019. Copyright ©2023 by
McGrawHill LLC. All rights reserved. Printed in the United States of America. No part of this publication may
be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without
the prior written consent of McGrawHill LLC, including, but not limited to, in any network or other electronic
storage or transmission, or broadcast for distance learning.

Some ancillaries, including electronic and print components, may not be available to customers outside the
United States.

This book is printed on acid-free paper.

1 2 3 4 5 6 7 8 9 LWI 27 26 25 24 23 22

ISBN 978-1-265-09445-4
MHID 1-265-09445-4

Cover Image: sasirin pamai/Shutterstock

All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.

The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a website does
not indicate an endorsement by the authors or McGrawHill LLC, and McGrawHill LLC does not guarantee the
accuracy of the information presented at these sites.

mheducation.com/highered

ISTUDY ric94454_fm_ise.indd ii 10/08/21 08:27 PM


Dedications
My wonderful daughter, Rachel, for your constant love,
encouragement, and support. You always make me laugh
and smile!
—Vern Richardson

To my three wonderful little Teeter tots, who keep me on


my toes.
—Ryan Teeter

To the Mustache Running Club. Over many miles you all


have learned more about accounting data analytics than
you ever hoped for! Thanks for all of your support—on and
off the trail.
—Katie Terrell

ISTUDY
Preface
Data Analytics is changing the business world—data simply surround us! So many data are
available to businesses about each of us—how we shop, what we read, what we buy, what
music we listen to, where we travel, whom we trust, where we invest our time and money,
and so on. Accountants create value by addressing fundamental business and accounting
questions using Data Analytics.
All accountants must develop data analytic skills to address the needs of the profession
in the future—it is increasingly required of new hires and old hands. Data Analytics for
Accounting, 3e recognizes that accountants don’t need to become data scientists—they may
never need to build a data repository or do the real hardcore Data Analytics or learn how to
program a computer to do machine learning. However, there are seven skills that analytic-
minded accountants must have to be prepared for a data-filled world, including:
1. Developed analytics mindset—know when and how Data Analytics can address
business questions.
2. Data scrubbing and data preparation—comprehend the process needed to clean and
prepare the data before analysis.
3. Data quality—recognize what is meant by data quality, be it completeness, reliability,
or validity.
4. Descriptive data analysis—perform basic analysis to understand the quality of the
underlying data and their ability to address the business question.
5. Data analysis through data manipulation—demonstrate ability to sort, rearrange,
merge, and reconfigure data in a manner that allows enhanced analysis. This may
include diagnostic, predictive, or prescriptive analytics to appropriately analyze the
data.
6. Statistical data analysis competency—identify and implement an approach that will
use statistical data analysis to draw conclusions and make recommendations on a
timely basis.
7. Data visualization and data reporting—report results of analysis in an accessible way
to each varied decision maker and his or her specific needs.
Consistent with these skills, it’s important to recognize that Data Analytics is an iterative
process. The process begins by identifying business questions that can be addressed with
data, extracting and testing the data, refining our testing, and finally, communicating those
findings to management. Data Analytics for Accounting, 3e describes this process by relying
on an established Data Analytics model called the IMPACT cycle:1
1. Identify the questions.
2. Master the data.
3. Perform test plan.
4. Address and refine results.
5. Communicate insights.
6. Track outcomes.

1
Jean Paul Isson and Jesse S. Harriott, Win with Advanced Business Analytics: Creating Business Value
from Your Data (Hoboken, NJ: Wiley, 2013).

iv

ISTUDY
Preface v

Adapted from Win with Advanced Business Analytics: Creating Business


Value from Your Data, by Jean Paul Isson and Jesse S. Harriott.

The IMPACT cycle is described in the first four chapters, and then the process is
illustrated in auditing, managerial accounting, financial accounting, and taxes in Chapters
5 through 9. In response to instructor feedback, Data Analytics for Accounting, 3e now also
includes two new project chapters, giving students a chance to practice the full IMPACT
model with multiple labs that build on one another.
Data Analytics for Accounting, 3e emphasizes hands-on practice with real-world
data. Students are provided with hands-on instruction (e.g., click-by-click instructions,
screenshots, etc.) on datasets within the chapter; within the end-of-chapter materials; and in
the labs at the end of each chapter. Throughout the text, students identify questions, extract
and download data, perform testing, and then communicate the results of that testing.
The use of real-world data is highlighted by using data from Avalara, LendingClub,
College Scorecard, Dillard’s, the State of Oklahoma, as well as other data from our labs. In
particular, we emphasize the rich data from Dillard’s sales transactions that we use in more
than 15 of the labs throughout the text (including Chapter 11).
Data Analytics for Accounting, 3e also emphasizes the various data analysis tools students
will use throughout the rest of their career around two tracks—the Microsoft track (Excel,
Power BI) and a Tableau track (Tableau Prep and Tableau Desktop—available with free
student license). Using multiple tools allows students to learn which tool is best suited for
the necessary data analysis, data visualization, and communication of the insights gained—
for example, which tool is easiest for internal controls testing, which is best for analysis or
querying (using SQL) big datasets, which is best for data visualizations, and so on.

ISTUDY
About the Authors
Vernon J. Richardson is a Distinguished Professor of Accounting and the G. William
Glezen Chair in the Sam M. Walton College of Business at the University of Arkansas and a
Visiting Professor at Baruch College. He received his BS, Master of Accountancy, and MBA
from Brigham Young University and a PhD in accounting from the University of Illinois at
Urbana–Champaign. He has taught students at the University of Arkansas, Baruch College,
University of Illinois, Brigham Young University, Aarhus University, and University of
Kansas, and internationally at the China Europe International Business School (Shanghai),
Xi’an Jiaotong Liverpool University, Chinese University of Hong Kong–Shenzhen, and the
University of Technology Sydney.
Vernon J. Richardson
Dr. Richardson is a member of the American Accounting Association. He has served as
president of the American Accounting Association Information Systems section. He previously
served as an editor of The Accounting Review and is currently an editor at Accounting Horizons.
He has published articles in The Accounting Review, Journal of Information Systems, Journal of
Accounting and Economics, Contemporary Accounting Research, MIS Quarterly, International
Journal of Accounting Information Systems, Journal of Management Information Systems, Journal of
Operations Management, and Journal of Marketing. Dr. Richardson is also an author of McGraw
Hill’s Accounting Information Systems and Introduction to Data Analytics for Accounting textbooks.

Ryan A. Teeter is a Clinical Associate Professor of Accounting in the Katz Graduate


School of Business at the University of Pittsburgh. He teaches accounting information
systems, auditing, and accounting data analytics. Prior to receiving his PhD in accounting
information systems from Rutgers University, he worked at Google in Mountain View,
California. He has since worked with internal audit organizations at Siemens, Procter
& Gamble, Alcoa/Arconic, and FedEx, helping to develop robotic process automation
programs and Data Analytic solutions.
Dr. Teeter is a member of the American Accounting Association and has published
Ryan A. Teeter
articles in the Journal of Strategic Technologies in Accounting and Issues in Accounting
Education. He has received grant funding for Data Analytics research from PwC. Dr. Teeter
is also an author of McGraw Hill’s Introduction to Data Analytics for Accounting textbook.

Katie L. Terrell is an instructor in the Sam M. Walton College of Business at the University
of Arkansas. She received her BA degrees in English literature and in the Spanish language
from the University of Central Arkansas and her MBA from the University of Arkansas. She
expects a doctoral degree by 2021. She has taught students at the University of Arkansas;
Soochow University (Suzhou, China); the University College Dublin (Ireland); and Duoc
UC, a branch of the Catholic University of Chile (Vina del Mar, Chile).
She is a member of the American Accounting Association and has published a Statement
on Management Accounting for the Institute of Management Accountants on managing
organizational change in operational change initiatives. Terrell was named the 2019
Business Professional of the Year (Education) by the national Beta Alpha Psi organization.
She has recently been recognized for her innovative teaching by being the recipient of
Katie L. Terrell
the Mark Chain/FSA Teaching Award for innovative graduate-level accounting teaching
practices in 2016. She has worked with Tyson Foods, where she held various information
system roles, focusing on business analysis, project management for ERP implementations
and upgrades, and organizational change management. Terrell is also an author of McGraw
Hill’s Introduction to Data Analytics for Accounting textbook.

vi

ISTUDY
Acknowledgments
Our sincere thanks to all who helped us on this project.
Our biggest thanks to the awesome team at McGraw Hill, including Steve Schuetz, Tim
Vertovec, Rebecca Olson, Claire McLemore, Michael McCormick, Christine Vaughan,
Kevin Moran, Angela Norris, and Lori Hancock.
Our thanks also to each of the following:
The Walton College Enterprise Team (Paul Cronan, Ron Freeze, Michael Gibbs,
Michael Martz, Tanya Russell) for their work helping us get access to the Dillard’s data.
Shane Lunceford from LendingClub for helping gain access to LendingClub data.
Joy Caracciolo, Will Cocker, and Tommy Morgan from Avalara for their help to grant
permissions usage of the Avalara data.
Bonnie Klamm, North Dakota State University, and Ryan Baxter, Boise State University,
for their accuracy check review of the manuscript and Connect content.
In addition, the following reviewers and classroom testers who provided ideas and
insights for this edition. We appreciate their contributions.

Amelia Annette Baldwin Kalana Malimage


University of South Alabama University of Wisconsin–Whitewater
Dereck Barr-Pulliam Partha Mohapatra
University of Wisconsin–Madison California State University, Sacramento
Ryan Baxter Bonnie Morris
Boise State University Duquesne University
Cory Campbell Uday Murthy
Indiana State University University of South Florida
Heather Carrasco Kathy Nesper
Texas Tech University University at Buffalo
Curtis Clements Kamala Raghavan
Abilene Christian University Texas Southern University
Elizabeth Felski Marie Rice
State University of New York at Geneseo West Virginia University
Amber Hatten Ali Saeedi
The University of Southern Mississippi University of Minnesota Crookston
Jamie Hoeischer Karen Schuele
Southern Illinois University, Edwardsville John Carroll University
Chris C. Hsu Drew Sellers
York College, City University of New York Kent State University
Venkataraman Iyer Joe Shangguan
University of North Carolina at Greensboro Robert Morris University
Andrea S. Kelton Vincent J. Shea
Middle Tennessee State University St. John’s University
Bonnie Klamm Jacob Shortt
North Dakota State University Virginia Tech
Gregory Kogan Marcia Watson
Long Island University, Brooklyn University of North Carolina at Charlotte
Hagit Levy Liu Yang
Baruch College, CYNY Southeast Missouri State University
Brandon Lock Zhongxia Ye
Baruch College, CUNY University of Texas, San Antonio
Sharon M. Lightner Qiongyao (Yao) Zhang
National University Robert Morris University
Vernon Richardson
Ryan Teeter
Katie Terrell

ISTUDY
Key Features
• NEW! Color Coded Multi-Track Labs: Instructors have the flexibility to guide
students through labs using the Green Track: Microsoft tools (including Excel, Power
Query, and Power BI); Blue Track: Tableau tools (including Tableau Prep Builder
and Tableau Desktop); or both. Each track is clearly identified and supported with
additional resources.
• NEW! Lab Example Outputs: Each lab begins with an example of what students are
expected to create. This provides a clear reference and guide for student deliverables.
• NEW! Auto-Graded Problems: The quantity and variety of auto-graded problems
that are assignable in McGraw Hill Connect have been expanded.
• NEW! Discussion and Analysis: Now available as manually graded assignments in
McGraw Hill Connect.
• Emphasis on Skills: Working through the IMPACT cycle framework, students
will learn problem assessment, data preparation, data analysis, data visualization,
control contesting, and more.
• Emphasis on Hands-On Practice: Students will be provided hands-on learning (click-
by-click instructions with screenshots) on datasets within each chapter, within the
end-of-chapter materials, and in the labs and comprehensive cases.
• Emphasis on Datasets: To illustrate data analysis techniques and skills, multiple
practice datasets (audit, financial, and managerial data) will be used in every chapter.
Students gain real-world experience working with data from Avalara, LendingClub,
Dillard’s, College Scorecard, the State of Oklahoma, as well as financial statement
data (via XBRL) from S&P100 companies.
• Emphasis on Tools: Students will learn how to conduct data analysis using Microsoft
and Tableau tools. Students will compare and contrast the different tools to
determine which are best suited for basic data analysis and data visualization, which
are easiest for internal controls testing, which are best for SQL queries, and so on.

viii

ISTUDY
Main Text Features First Pages

Chapter Maps
These maps provide a guide of what we’re
going to cover in the chapter as well as a
Chapter 2
guide of what we’ve just learned and what’s Mastering the Data
coming next.

Chapter-Opening Vignettes
Because companies are facing new and exciting
opportunities with their use of Data Analytics A Look at This Chapter
to help with accounting and business deci- This chapter provides an overview of the types of data that are used in the accounting cycle and common data that
are stored in a relational database. The second step of the IMPACT cycle is “mastering the data,” which is sometimes
sions, we detail what they’re doing and why in called ETL for extracting, transforming, and loading the data. We will describe how data are requested and extracted
to answer business questions and how to transform data for use via data preparation, validation, and cleaning. We

our chapter-opening vignettes. conclude with an explanation of how to load data into the appropriate tool in preparation for analyzing data to
make decisions.

First Pages A Look Back


Chapter 1 defined Data Analytics and explained that the value of Data Analytics is in the insights it provides.
We described the Data Analytics Process using the IMPACT cycle model and explained how this process is used
to address both business and accounting questions. We specifically emphasized the importance of identifying
appropriate questions that Data Analytics might be able to address.

A Look Ahead
We are lucky to live in a world in which data are abundant. How­
ever, even with rich sources of data, when it comes to being Chapter 3 describes how to go from defining business problems to analyzing data, answering questions, and address-
able to analyze data and turn them into useful information and ing business problems. We identify four types of Data Analytics (descriptive, diagnostic, predictive, and prescriptive
insights, very rarely can an analyst hop right into a dataset and
begin analyzing. Datasets almost always need to be cleaned analytics) and describe various approaches and techniques that are most relevant to analyzing accounting data.
and validated before they can be used. Not knowing how to
clean and validate data can, at best, lead to frustration and poor
insights and, at worst, lead to horrible security violations. While
this text takes advantage of open source datasets, these data­
sets have all been scrubbed not only for accuracy, but also to
protect the security and privacy of any individual or company
Wichy/Shutterstock
whose details were in the original dataset.
In 2015, a pair of researchers named Emil Kirkegaard and
Julius Daugbejerg Bjerrekaer scraped data from OkCupid, a free dating website, and provided the data onto the
“Open Science Framework,” a platform researchers use to obtain and share raw data. While the aim of the Open
Science Framework is to increase transparency, the researchers in this instance took that a step too far—and a step
into illegal territory. Kirkegaard and Bjerrekaer did not obtain permission from OkCupid or from the 70,000 OkCupid
users whose identities, ages, genders, religions, personality traits, and other personal details maintained by the dat­ 52
ing site were provided to the public without any work being done to anonymize or sanitize the data. If the researchers
had taken the time to not just validate that the data were complete, but also to sanitize them to protect the individu­
als’ identities, this would not have been a threat or a news story. On May 13, 2015, the Open Science Framework
removed the OkCupid data from the platform, but the damage of the privacy breach had already been done.1
A 2020 report suggested that “Any consumer with an average number of apps on their phone—anywhere between
40 and 80 apps—will have their data shared with hundreds or perhaps thousands of actors online,” said Finn Myrstad,
the digital policy director for the Norwegian Consumer Council, commenting specifically about dating apps.2 ric44907_ch02_052-113.indd 52 08/25/21 03:09 PM
All told, data privacy and ethics will continue to be an issue for data providers and data users. In this chapter, we
look at the ethical considerations of data collection and data use as part of mastering the data.

OBJECTIVES Learning Objectives


After reading this chapter, you should be able to:

LO 2-1 Understand available internal and external data sources and how data
are organized in an accounting information system.
LO 2-2 Understand how data are stored in a relational database.
We feature learning objectives at the beginning
LO 2-3 Explain and apply extraction, transformation, and loading (ETL)
techniques to prepare the data for analysis.
of each chapter. Having these learning objectives
LO 2-4 Describe the ethical considerations of data collection and data use.
provides students with an overview of the con- First

1
B. Resnick, “Researchers Just Released Profile Data on 70,000 OkCupid Users without Permission,”
cepts to be taught in the chapter and the labs.
Vox, 2016, http://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release (accessed
October 31, 2016).
2
N. Singer and A. Krolik, “Grindr and OkCupid Spread Personal Details, Study Says,” New York Times,
January 13, 2020, https://www.nytimes.com/2020/01/13/technology/grindr-apps-dating-data-tracking.
html (accessed December 2020).

53
Chapter 2 Mastering the Data

Progress Checks
ric44907_ch02_052-113.indd 53 08/25/21 03:09 PM
PROGRESS CHECK
1. Referring to Exhibit 2­2, locate the relationship between the Supplier and
Periodic progress check questions are posed to the Purchase Order tables. What is the unique identifier of each table? (The unique
identifier attribute is called the primary key—more on how it’s determined in the
students throughout each chapter. These checks next learning objective.) Which table contains the attribute that creates the rela­
tionship? (This attribute is called the foreign key—more on how it’s determined in
provoke the student to stop and consider the con- the next learning objective.)
cepts presented. 2. Referring to Exhibit 2­2, review the attributes in the Purchase Order table. There
are two foreign keys listed in this table that do not relate to any of the tables in
the diagram. Which tables do you think they are? What type of data would be
stored in those two tables?
3. Refer to the two tables that you identified in Progress Check 2 that would
relate to the Purchase Order table, but are not pictured in this diagram. Draw
a sketch of what the UML Class Diagram would look like if those tables were

relationships that
tables.

ISTUDY
data needed for solving the data analysis problem, as well as cleaning
PK:and preparing
Check Numberthe data for analysis.

primary key (57) An attribute that is required to exist in each table of a relational database and serves
as the “unique identifier” for each record in a table.
4. The purpose of the primary key is to uniquely identify each record in a table. The purpose

End-of-Chapter Materials
relational database (56) A means of storing data in order to ensure that the data are complete, not
of a foreign key is to create a relationship between two tables. The purpose of a descrip­
redundant, and to help enforce business rules. Relational databases also aid in communication and integra-
tive attribute is to provide meaningful information about each record in a table. Descrip­
tion of business processes across an organization.
tive attributes aren’t required for a database to run, but they are necessary for people to
supply chain
gain management
business (SCM)
information system
about the(54) An information
data stored system that helps manage all the
in their databases.
company’s
5. Datainteractions with
dictionaries suppliers.
provide descriptions of the function (e.g., primary key or foreign key
when applicable), data type, and field names associated with each column (attribute) of
a database. Data dictionaries are especially important when databases contain several
different tables and many different attributes in order to help analysts identify the informa­
tion they need to perform their analysis.
Answers to Progress Checks ANSWERS TO PROGRESS CHECKS
6. Depending on the level of security afforded to a business analyst, she can either obtain
data directly from the database herself or she can request the data. When obtaining data
1. Theherself, theidentifier
unique analyst must have
of the accesstable
Supplier to theis raw data in
[Supplier theand
ID], database and aidentifier
the unique firm knowledge
of the
The answers allow students to evaluate if they are of SQL and
Purchase datatable
Order extraction
is [PO techniques.
Number]. The When requesting
Purchase Orderthe data,
table the analyst
contains doesn’t key.
the foreign need
the same level of extraction skills, but she still needs to be familiar with the data enough in
on track with their understanding of the materials 2. The foreign key attributes in the Purchase Order table that do not relate to any tables
order to identify which tables and attributes contain the information she requires.
in the view are EmployeeID and CashDisbursementID. These attributes probably relate
presented in the chapter. 7. toFour
the common
Employeeissues
table that mustwe
(so that becan
fixedtellare removing
which headings
employee or subtotals,forcleaning
was responsible each
leading zeroes
Purchase Order) or
andnonprintable characters, formatting
the Cash Disbursement table (so negative
that we cannumbers, andPurchase
tell if the correcting
inconsistencies across the data.
Orders have been paid for yet, and if so, on which check). The Employee table would be a
8. complete
Firms can listing of each
ask to see theemployee,
terms andas well as containing
conditions of theirthe details about
third­party each employee
data supplier, and ask
(for example,tophone
questions come number, address, etc.).
to an understanding The Cashif Disbursement
regarding and how privacy table would be
practices areamain­
list­
ing of the payments the company has made.
tained. They also can evaluate what preventive controls on data access are in place and
assess whether they are followed. Generally, an audit does not need to be performed, but
70 requesting a questionnaire be filled out would be appropriate.
First Pag

Multiple Choice Questions ric44907_ch02_052-113.indd 70


Multiple Choice Questions
®

08/25/21 03:09 PM
1. (LO 2-3) Mastering the data can also be described via the ETL process. The ETL pro­
The multiple choice questions quickly assess stu- 10. (LO 2-4)stands
cess Whichfor:
of the following questions are not suggested by the Institute of Business
Ethics to allow a business to create value from data use and analysis, and still protect
dent’s knowledge of chapter content. a. extract, total, and load data.
the privacy of stakeholders?
b. enter, transform, and load data.
a. How does the company use data, and to what extent are they integrated into firm
c. extract, transform, and load data.
strategy?
b. d. enter,
Does thetotal, and load
company senddata.
a privacy notice to individuals when their personal data are
collected?
c. Does the data used by the company include personally identifiable information?
d. Does the company have the appropriate tools to mitigate the risks of data misuse?

Discussion and Analysis—Now Discussion and Analysis


ric44907_ch02_052-113.indd 71
®

08/25/21 0

in Connect! 1. (LO 2-2) The advantages of a relational database include limiting the amount of redun­
dant data that are stored in a database. Why is this an important advantage? What can
go wrong when redundant data are stored?
This feature provides questions for group discus- 2. (LO 2-2) The advantages of a relational database include integrating business pro­
cesses. Why is it preferable to integrate business processes in one information system,
sion and analysis. Now available as manually rather than store different business process data in separate, isolated databases?
graded assignments in McGraw Hill Connect! 3. (LO 2-2) Even though it is preferable to store data in a relational database, storing data
across separate tables can make data analysis cumbersome. Describe three reasons it
is worth the trouble to store data in a relational database.
4. (LO 2-2) Among the advantages of using a relational database is enforcing business
11. (LO 2-4)
rules. Based on your understanding of What
how theis structure
the themeof aof each ofdatabase
relational the six helps
questions proposed by
prevent data redundancy and other advantages,
Business Ethics? Which howonedoesaddresses
the primary thekey/foreign
purpose of the data? Which
key relationship structure help
how enforce a business
the risks rule that
associated withindicates
data use that
anda company
collection are mitigated? H
shouldn’t process any purchase
twoorders from
specific suppliers who
objectives don’t existat
be achieved inthe
the same
database?
time?
5. (LO 2-2) What is the purpose of a data dictionary? Identify four different attributes that
could be stored in a data dictionary, and describe the purpose of each.
6. (LO 2-3) In the ETL process, the first step is extracting the data. When you are obtaining
Problems Problems
®
the data yourself, what are the steps to identifying the data that you need to extract?
7. (LO 2-3) In the ETL process, if the analyst does not have the security permissions to
access the data directly, then he or she will need to fill out a data request form. While
The problems challenge the student’s ability 1. (LO 2-2) Match the relational database function to the appropriate rela
this doesn’t necessarily require the analyst to know extraction techniques, why does
term: the raw data very well in order to complete the data
to see relationships in the learning objectives the analyst still need to understand
request? • Composition primary key
with analysis options that employ higher-level 8. (LO 2-3) In the ETL process, when an analyst is
• Descriptive completing the data request form, there
attribute
thinking and analytical skills. The quantity of are a number of fields that the analyst is required to complete. Why do you think it is
• Foreign key
important for the analyst to indicate the frequency of the report? How do you think that
auto-graded problems has been expanded. The would affect what the database• administrator
Primary key does in the extraction?
manually graded analysis problems are also now 9. (LO 2-3) Regarding the data•request form, why
Relational do you think it is important to the data­
database
base administrator to know the purpose of the request? What would be the importance
assignable in McGraw Hill Connect. of the “To be used in” and “Intended audience” fields?
Relational
10. (LO 2-3) In the ETL process, one Database
important Functionwhen transforming the data
step to process Relational
is to work with null, n/a, and zero valuesas
1. Serves in the dataset.
a unique If you have
identifier a field of quantitative
in a database table.
data (e.g., number of years each individual in the table has held a full­time job), what
2. Creates a relationship between two tables.
would be the effect of the following?
3. Two into
a. Transforming null and n/a values foreign keys from the tables that it is linking combine to
blanks
b.
4.
x c.
(Hint: Think about the impact on
ISTUDY AVERAGE.) 5.
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Sláinte has branched out in recent years to offer custom-branded
microbrews for local bars that want to offer a distinctive tap and corporate events where
a custom brew can be used to celebrate a big milestone. Each of these custom orders fall
into one of two categories: Standard jobs include the brew and basic package design,
and Premium jobs include customizable brew and enhanced package design with custom
designs and upgraded foil labels. Sláinte’s management has begun tracking the costs asso-
ciated with each of these custom orders with job cost sheets that include materials, labor,
NEW! Color Coded Multi-Track Labs and overhead allocation. You have been tasked with helping evaluate the costs for these
custom jobs.
Data: Lab 7-1 Slainte Job Costs.zip - 26KB Zip / 30KB Excel
The labs give students hands-on experience working with different types of data
and the tools used to analyze them. Students completeLab labs7-1usingExample Output
the instructor-
led
Tableautrack and
Software, Inc. answer
All rights reserved. common questions. Clear step-by-step directions
By the end help
of this lab, you model
will create a dashboard that will let you explore job cost variance.
LAB 7-1T Example Job Cost Dashboard in Tableau Desktop While your results will include different data values, your work should look similar to this:
the expected output of each lab exercise.

The Green Track—Microsoft/


Lab 7-1 Part 1 Calculate Revenues and Costs
Microsoft | Power BI Desktop
Power BI: Example Output
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-1 [Your name] [Your email address].docx.
All of the data revolves around the job cost records, but you are only given the com-
ponents, such as quantity and amount. Before you build your dashboard, you will need to
calculate Green
The Track—Microsoft
the aggregate / Power
values for category costs, includingBI:
the Easy toand actual direct
budgeted
Follow
materials, Step-by-Step
direct labor, overhead Lab Instruction
costs, and profit.

Microsoft | Power BI Desktop


First Pages
1. Open Power BI Desktop and connect to your data:
F
a. Click Home > Get Data > Excel.
b. Browse to find the Lab 7-1 Slainte Job Costs.xlsx file and click Open.
c. Check all of the tables and click Load.
First Pages
d. Click Modeling > Manage relationships to verify that the tables loaded Microsoft Excel
correctly. For Profit
b. Favorable example, if you=
Margin seeIF([Actual
an issue with theMargin]<=[Target
Profit relationship betweenProfit
LAB 7-1M Example Job Cost Dashboard in Power BI Desktop
Margin],“#F28E2B”,“#4E79A7”) Lab 2-7 Part 2 Objective Questions (LO 2-2, 2-3)
7. Scroll your field list to show your new calculated values and Take a OQ1. What do you notice about the TRAN_AMT for transactions with
356 screenshot (label it 7-1MA). Note: Your report should still be blank at this TRAN_TYPE “P”?

The Blue Track—Tableau: Example Output


point.
8. Save your file as Lab 7-1 Slainte Job Costs.pbix. Answer the questions for
Tableau | Desktop
OQ2. What do you notice about the TRAN_AMT for transactions with
TRAN_TYPE “R”?
this part and then continue to next part.
OQ3. What do “P” type transactions and “R” type transactions represent?
ric44907_ch07_334-403.indd 356
The Blue Track—Tableau: Easy to Follow Step-by-Step07/29/21
Lab09:58Instruction
AM

Lab 2-7 Part 2 Analysis Questions (LO 2-2, 2-3)


AQ1. Compare and Contrast: Compare the methods for previewing data in th
Tableau | Desktop in Tableau Prep versus Microsoft Power BI. Which method is easier to
with?
1. Open Tableau Desktop and connect to your data: AQ2. Compare and Contrast: Compare the methods for identifying data type
a. Click Connect to Data > Microsoft Excel. table in Tableau Prep versus Microsoft Power BI. Which method is easi
ric44907_ch07_334-403.indd 355 interact with? 07/29/21 0
b. Browse to find the Lab 7-1 Slainte Job Costs.xlsx file and click Open.
AQ3. Compare and Contrast: Compare viewing the data distribution and filte
c. Drag the Job_Orders table to the data model panel, then connect the
in Tableau Prep’s Clean step to Microsoft Power BI’s Data Profiling op
Customers, Time_Record, Material_Requisition, and Job_Rates tables to
Which method is easier to interact with?
right of it.
d. Finally, drag the Employees table to the right of the Time_Record Lab 2-7 Submit Your Screenshot Lab Document
table.
Verify that you have answered any questions your instructor has assigned, then up
2. Rename Sheet 1 to Job Composition and edit the following data types: screenshot lab document to Connect or to the location indicated by your instruct
a. For all Job No fields (3 total), right-click each and choose Convert to
Dimension.
3. Next, create calculated fields for the following category subtotals that you
Comprehensive Case Labs
Tableau Software, Inc. All rights reserved.
will include in your graphs and tables. Click Analysis > Create CalculatedLab 2-8 Comprehensive Case: Preview a Subset of Data
LAB 7-1T Example Job Cost Dashboard in Tableau Desktop
Field and create each of the following six fields (name: formula):
a. Actual Revenue: SUM([Job Revenue])
Excel, Tableau Using a SQL Query—Dillard’s
Use a real-life Big Data set based on Dillard’s
b. Actual DM Cost: SUM([Material Cost]) Lab 7-1 Part 1 Calculate Revenues and Costs
actual company data. This dataset allows students
c. Actual DL Cost: SUM([Hours]*[Direct Labor Rate])
Lab Note: The tools presented in this lab periodically change. Updated instruction
Before you begin the lab, you should create a new blank Word document where you will
cable, can be found in the eBook and lab walkthrough videos in Connect.
to build their skills andSUM([Hours]*[Overhead
d. Actual OH Cost: test their conclusions Rate])
record your screenshots and save it as Lab 7-1 [Your name] [Your email address].docx.
Youaround
are a brand-new
AllCase Summary:
of the data revolves the job cost analyst
records, and youare
but you justonly
gotgiven
assigned to wo
the com-
across concepts
e. Actual Profit: [ActualinRevenue]-[Actual
covered each chapter. The
DL Cost]-[Actual DM Cost]- Dillard’s
ponents, account.
such So far
as quantity andyou haveBefore
amount. analyzed
you the
buildER Diagram
your to gain
dashboard, a bird’s-eye
you will need to v
Comprehensive Cases
[Actual OH Cost]can be followed continu- the different
calculate tables and
the aggregate fields
values in the database,
for category and you
costs, including the have explored
budgeted the data
and actual directin e
to gain adirect
materials, glimpse
labor,ofoverhead
sample costs,
valuesand
from each field and how they are all formatted.
profit.
f. Actual Profit Margin: [Actual Profit]/[Actual Revenue]
ously from the first chapter or picked up at any gained a little insight into the distribution of sample values across each field, b
4. To enable benchmarks for comparison, add the following calculated fields
later point in the book;
for the budgeted enough information is
amounts:
point you are ready to dig into the data a bit more. In the previous comprehensive
connected to full tables in Tableau or Power BI to explore the data. In this lab, i
provideda.toBudgeted
ensureDMstudents can getBudgeted
Cost = SUM([Job right DMto work.
Cost]) Microsoft
connecting | Power
to full BI will
tables, we Desktop
write a SQL query to pull only a subset of data int
b. Budgeted DL Cost = SUM([Job Budgeted Hours])*SUM([Direct_La- or Excel. This tactic is more effective when the database is very large and you c
bor_Rate]) 1.
c. Budgeted OH Cost = SUM([Job Budgeted Hours])*SUM([Overhead
Rate])
Data:
ISTUDY d. Budgeted Profit = [Actual Revenue]-[Budgeted DL Cost]-[Budgeted DM
Cost]-[Budgeted OH Cost] d. Click Modeling > Manage relationships to verify that the tables loaded
Data Analytics for Accounting, 3e
Content Updates
General Updates for the 3rd Edition
• Color coded multi-track labs now emphasize two tracks: The green Microsoft Track
(including Excel, Power Query, and Power BI) and blue Tableau Track (including
Tableau Prep Builder and Tableau Desktop).
• Added additional End-of-Chapter Multiple Choice Questions throughout the text
that are auto-graded in Connect.
• Significantly revised many End-of-Chapter Problems for availability and auto-grading
within Connect. Analysis Problems in Connect are manually graded.
• Linked chapter content to lab content using Lab Connections within the chapter content.

Chapter by Chapter Updates


Specific chapter changes for Data Analytics for Accounting, 3e are as follows:

Chapter 1
• Added new opening vignette regarding a recent IMA survey of finance and
accounting professionals and their use of Big Data and Data Analytics.
• Added discussion on how analytics are used in auditing, tax, and management accounting.
• Included introduction to the variety of analytics tools available and explanation of
dual tracks for labs including Microsoft Track and Tableau Track.
• Added “Data Analytics at Work” box feature: What Does an Analyst Do at a Big
Four Accounting Firm.
• Added six new Connect-ready problems.
• Implemented lab changes:
• All-new tool connections in Lab 1-5.
• Revised Labs 1-0 to 1-4.

Chapter 2
• Edited opening vignette to include current examples regarding data privacy and ethics.
• Added a discussion on ethical considerations related to data collection and use.
• Added exhibit with potential external data sources to address accounting questions.
• Expanded the data extraction section to first include data identification, including
the use of unstructured data.
• Added “Data Analytics at Work” box feature: Jump Start Your Accounting Career
with Data Analytics Knowledge.
• Added six new Connect-ready problems.
• Implemented lab changes:
• Revised Labs 2-1 to 2-8.

ISTUDY
Chapter 3
• Refined the discussion on diagnostic analytics.
• Improved the discussion on the differences between qualitative and quantitative
data and the discussion of the normal distribution.
• Refined the discussion on the use of regression as an analytics tool.
• Added examples of time series analysis in the predictive analytics section.
• Added “Data Analytics at Work” box feature: Big Four Invest Billions in Tech,
Reshaping Their Identities as Professional Services Firm with a Technology Core.
• Added six new Connect-ready problems.
• Implemented lab changes:
• All-new cluster analysis in Lab 3-2.
• Revised Labs 3-1, 3-3 to 3-6.

Chapter 4
• Added discussion of statistics versus visualizations using Anscombe’s quartet.
• Updated explanations of box plots and Z-scores.
• Added “Data Analytics at Work” box feature: Data Visualization: Why a Picture Can
Be Worth a Thousand Clicks.
• Added six new Connect-ready problems.
• Implemented lab changes:
• All-new dashboard in Lab 4-3.
• Revised Labs 4-1, 4-2, 4-4, 4-5.

Chapter 5
• Improved and clarified content to match the focus on descriptive, diagnostic, predictive,
and prescriptive analytics.
• Added “Data Analytics at Work” box feature: Citi’s $900 Million Internal Control
Mistake: Would Continuous Monitoring Help?
• Added six new Connect-ready problems.
• Implemented lab changes:
• Revised Labs 5-1 to 5-5.

Chapter 6
• Clarified chapter content to match the focus on descriptive, diagnostic, predictive,
and prescriptive analytics.
• Added “Data Analytics at Work” box features: Do Auditors Need to Be Programmers?
• Added six new Connect-ready problems.
• Implemented lab changes:
• Major revisions to Labs 6-1 to 6-5.

Chapter 7
• Added new exhibit and discussion that maps managerial accounting questions to
data approaches.
• Added “Data Analytics at Work” box feature: Maximizing Profits Using Data Analytics
• Added five new Connect-ready problems.
• Implemented lab changes:
• All-new job cost, balanced scorecard, and time series dashboards in Lab 7-1, 7-2, 7-3.
• Revised Lab 7-4, 7-5.

ISTUDY
xiv Key Features

Chapter 8
• Added new exhibit and discussion that maps financial statement analysis questions
to data approaches.
• Added four new Connect-ready problems.
• Implemented lab changes:
• All-new sentiment analysis in Lab 8-4.
• Revised Labs 8-1 to 8-3.

Chapter 9
• Added new exhibit and discussion that maps tax questions to data approaches.
• Added four new Connect-ready problems.
• Implemented lab changes:
• Revised Labs 9-1 to 9-5.

Chapter 10
• Updated project chapter that evaluates different business processes, including the
order-to-cash and procure-to-pay cycles, from different user perspectives with a
choice to use the Microsoft track, the Tableau track, or both.
• Added extensive, all-new set of objective and analysis questions to assess analysis
and learning.

Chapter 11
• Updated project chapter, estimating sales returns at Dillard’s with three question sets
highlighting descriptive and exploratory analysis, hypothesis testing, and predictive
analytics with a choice to use the Microsoft track, the Tableau track, or both.
• Added extensive, all-new set of objective and analysis questions to assess analysis
and learning.

ISTUDY
Connect for Data Analytics
for Accounting
®

With McGraw Hill Connect for Data Analytics for Accounting, your students receive proven
study tools and hands-on assignment materials, as well as an adaptive eBook. Here are some
of the features and assets available with Connect.

Proctorio: New remote proctoring and browser-locking capabilities, hosted by Proctorio within
Connect, provide control of the assessment environment by enabling security options and
verifying the identity of the student. Seamlessly integrated within Connect, these services allow
instructors to control students’ assessment experience by restricting browser activity, recording
students’ activity, and verifying students are doing their own work. Instant and detailed reporting
gives instructors an at-a-glance view of potential academic integrity concerns, thereby avoiding
personal bias and supporting evidence-based claims.
SmartBook 2.0: A personalized and adaptive learning tool used to maximize the learning
experience by helping students study more efficiently and effectively. Smartbook 2.0 highlights
where in the chapter to focus, asks review questions on the materials covered, and tracks the most
challenging content for later review recharge. Smartbook 2.0 is available both online and offline.

Orientation Videos: Video-based tutorial assignments are designed to train students via
an overview video followed by a quiz for each of the assignment types they will find in
McGraw Hill Connect.
Multiple Choice Questions: The multiple choice questions from the end-of-chapter materials
are assignable and auto-gradable in McGraw Hill Connect, with the option to provide stu-
dents with instant feedback on their answers and performance.
Discussion and Analysis Questions: We have added the Discussion and Analysis questions
into McGraw Hill Connect as manually graded assignments for convenience of assignment
organization. These can be utilized for small group or in-class discussion.

ISTUDY
Problems: Select problems from the text are auto-graded in McGraw Hill Connect.
Manually graded analysis problems are also now available to ensure students are building
an analytical skill set.

Color Coded Multi-Track Labs: Labs are assignable in McGraw Hill Connect as the green
Microsoft Track (including Excel, Power Query, and Power BI) and blue Tableau Track
(including Tableau Prep Builder and Tableau Desktop).

xvi

ISTUDY
Students complete their lab work outside of Connect in the lab track selected by their
professor. Students answer assigned lab questions designed to ensure they understood the
key skills and outcomes from their lab work. Both auto-graded lab objective questions and
manually graded lab analysis questions are assignable in Connect.
Comprehensive Cases: Comprehensive case labs are assignable in McGraw Hill Connect.
Students work outside of Connect to complete the lab using the Dillard’s real-world Big
Data set. Once students complete the comprehensive lab, they will go back into Connect
to answer questions designed to ensure they completed the lab and understood the key
skills and outcomes from their lab work.

Lab Walkthrough Videos: These author-led lab videos in McGraw Hill Connect explain
how to access and use the tools needed to complete the processes essential to the labs. Lab
videos improve student success and minimize student questions!

Author Lecture Videos: Lecture Videos assignable in McGraw Hill Connect teach each
chapter’s core learning objectives and concepts through an author-developed, hands-on
presentation, bringing the text content to life. The videos have the touch and feel of a live
lecture, rather than a canned presentation, so you can learn at your own pace.
Writing Assignment: The Writing Assignment tool delivers a learning experience to help
students improve their written communication skills and conceptual understanding. As an
instructor you can assign, monitor, grade, and provide feedback on writing more efficiently
and effectively in McGraw Hill Connect.
Test Bank: The test bank includes auto-graded multiple choice and true/false assessment
questions. The test bank can be assigned directly within McGraw Hill Connect or exported
from Test Builder.

ISTUDY
Instructors: Student Success Starts with You
Tools to enhance your unique voice
Want to build your own course? No problem. Prefer to use an
OLC-aligned, prebuilt course? Easy. Want to make changes throughout
65%
Less Time
the semester? Sure. And you’ll save time with Connect’s auto-grading too.
Grading

Study made personal


Incorporate adaptive study resources like
SmartBook® 2.0 into your course and help your
students be better prepared in less time. Learn
more about the powerful personalized learning
experience available in SmartBook 2.0 at
www.mheducation.com/highered/connect/smartbook

Laptop: McGraw Hill; Woman/dog: George Doyle/Getty Images

Affordable solutions, Solutions for


added value your challenges
Make technology work for you with A product isn’t a solution. Real
LMS integration for single sign-on access, solutions are affordable, reliable,
mobile access to the digital textbook, and come with training and
and reports to quickly show you how ongoing support when you need
each of your students is doing. And with it and how you want it. Visit www.
our Inclusive Access program you can supportateverystep.com for videos
provide all these tools at a discount to and resources both you and your
your students. Ask your McGraw Hill students can use throughout the
representative for more information. semester.

Padlock: Jobalou/Getty Images Checkmark: Jobalou/Getty Images

ISTUDY
Students: Get Learning that Fits You
Effective tools for efficient studying
Connect is designed to help you be more productive with simple, flexible, intuitive tools that maximize
your study time and meet your individual learning needs. Get learning that works for you with Connect.

Study anytime, anywhere “I really liked this


Download the free ReadAnywhere app and access app—it made it easy
your online eBook, SmartBook 2.0, or Adaptive
to study when you
Learning Assignments when it’s convenient, even
don't have your text-
book in front of you.”
if you’re offline. And since the app automatically
syncs with your Connect account, all of your work is
available every time you open it. Find out more at
www.mheducation.com/readanywhere - Jordan Cunningham,
Eastern Washington University

Everything you need in one place


Your Connect course has everything you need—whether reading on
your digital eBook or completing assignments for class, Connect makes
it easy to get your work done.

Calendar: owattaphotos/Getty Images

Learning for everyone


McGraw Hill works directly with Accessibility Services
Departments and faculty to meet the learning needs
of all students. Please contact your Accessibility
Services Office and ask them to email
[email protected], or visit
www.mheducation.com/about/accessibility
for more information.

Top: Jenner Images/Getty Images, Left: Hero Images/Getty Images, Right: Hero Images/Getty Images

ISTUDY
Brief Table of Contents
Preface iv
About the Authors vi
Acknowledgments vii
Key Features viii
Main Text Features ix
End-of-Chapter Materials x
Data Analytics for Accounting, 3e Content Updates xii
Connect for Data Analytics for Accounting xv
Chapter 1 Data Analytics for Accounting and Identifying the Questions 2
Chapter 2 Mastering the Data 52
Chapter 3 Performing the Test Plan and Analyzing the Results 114
Chapter 4 Communicating Results and Visualizations 180
Chapter 5 The Modern Accounting Environment 244
Chapter 6 Audit Data Analytics 282
Chapter 7 Managerial Analytics 334
Chapter 8 Financial Statement Analytics 404
Chapter 9 Tax Analytics 454
Chapter 10 Project Chapter (Basic) 498
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 512
Appendix A Basic Statistics Tutorial 528
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 534
Appendix C Accessing the Excel Data Analysis Toolpak 544
Appendix D SQL Part 1 546
Appendix E SQL Part 2 560
Appendix F Power Query in Excel and Power BI 564
Appendix G Power BI Desktop 572
Appendix H Tableau Prep Builder 578
Appendix I Tableau Desktop 582
Appendix J Data Dictionaries 586

GLOSSARY 588

INDEX 593

ISTUDY
Detailed TOC
Chapter 1 Internal and External Data Sources 54
Data Analytics for Accounting and Identifying Accounting Data and Accounting Information Systems 56
the Questions 2 Data and Relationships in a Relational Database 56
Columns in a Table: Primary Keys, Foreign Keys, and
Data Analytics 4
Descriptive Attributes 57
How Data Analytics Affects Business 4
Data Dictionaries 59
How Data Analytics Affects Accounting 5
Extract, Transform, and Load (ETL) the Data 60
Auditing 6
Extract 61
Management Accounting 7
Transform 64
Financial Reporting and Financial Statement Analysis 7
Load 67
Tax 8
Ethical Considerations of Data Collection and Use 68
The Data Analytics Process Using the IMPACT Cycle 9
Summary 69
Step 1: Identify the Questions (Chapter 1) 9
Key Words 70
Step 2: Master the Data (Chapter 2) 10
Answers to Progress Checks 70
Step 3: Perform Test Plan (Chapter 3) 10
Multiple Choice Questions 71
Step 4: Address and Refine Results (Chapter 3) 13
Discussion and Analysis 73
Steps 5 and 6: Communicate Insights and Track
Problems 74
Outcomes (Chapter 4 and each chapter thereafter) 13
Lab 2-1 Request Data from IT—Sláinte 77
Back to Step 1 13
Lab 2-2 Prepare Data for Analysis—Sláinte 79
Data Analytic Skills and Tools Needed by
Lab 2-3 Resolve Common Data Problems—
Analytic-Minded Accountants 13
LendingClub 84
Choose the Right Data Analytics Tools 14
Lab 2-4 Generate Summary Statistics—
Hands-On Example of the IMPACT Model 17
LendingClub 91
Identify the Questions 17
Lab 2-5 Validate and Transform Data—
Master the Data 17
College Scorecard 95
Perform Test Plan 20
Lab 2-6 Comprehensive Case: Build
Address and Refine Results 23
Relationships among Database
Communicate Insights 24
Tables—Dillard’s 98
Track Outcomes 24
Lab 2-7 Comprehensive Case: Preview Data from
Summary 25
Tables—Dillard’s 103
Key Words 26
Lab 2-8 Comprehensive Case: Preview a Subset
Answers to Progress Checks 26
of Data in Excel, Tableau Using a SQL
Multiple Choice Questions 28
Query—Dillard’s 108
Discussion and Analysis 30
Problems 30
Lab 1-0 How to Complete Labs 36 Chapter 3
Lab 1-1 Data Analytics Questions in Financial Performing the Test Plan and Analyzing the Results 114
Accounting 39 Performing the Test Plan 116
Lab 1-2 Data Analytics Questions in Managerial Descriptive Analytics 119
Accounting 41 Summary Statistics 119
Lab 1-3 Data Analytics Questions in Auditing 42 Data Reduction 120
Lab 1-4 Comprehensive Case: Questions about Diagnostic Analytics 122
Dillard’s Store Data 44 Standardizing Data for Comparison (Z-score) 123
Lab 1-5 Comprehensive Case: Connect to Profiling 123
Dillard’s Store Data 47 Cluster Analysis 128
Hypothesis Testing for Differences in Groups 131
Chapter 2
Predictive Analytics 133
Mastering the Data 52
Regression 134
How Data Are Used and Stored in the Accounting Classification 137
Cycle 54 p

ISTUDY
xxii Contents

Prescriptive Analytics 141 Communication: More Than Visuals—Using Words


Decision Support Systems 141 to Provide Insights 202
Machine Learning and Artificial Intelligence 142 Content and Organization 202
Summary 143 Audience and Tone 203
Key Words 144 Revising 204
Answers to Progress Checks 145 Summary 204
Multiple Choice Questions 146 Key Words 205
Discussion and Analysis 148 Answers to Progress Checks 206
Problems 148 Multiple Choice Questions 207
Chapter 3 Appendix: Setting Up a Discussion and Analysis 208
Classification Analysis 151 Problems 208
Lab 3-1 Descriptive Analytics: Filter and Reduce Lab 4-1 Visualize Declarative
Data—Sláinte 153 Data—Sláinte 212
Lab 3-2 Diagnostic Analytics: Identify Data Lab 4-2 Perform Exploratory Analysis and Create
Clusters—LendingClub 157 Dashboards—Sláinte 218
Lab 3-3 Perform a Linear Regression Analysis— Lab 4-3 Create Dashboards—
College Scorecard 160 LendingClub 223
Lab 3-4 Comprehensive Case: Descriptive Lab 4-4 Comprehensive Case: Visualize
Analytics: Generate Summary Declarative Data—Dillard’s 229
Statistics—Dillard’s 166 Lab 4-5 Comprehensive Case: Visualize
Lab 3-5 Comprehensive Case: Diagnostic Exploratory Data—Dillard’s 236
Analytics: Compare Distributions—
Dillard’s 169 Chapter 5
Lab 3-6 Comprehensive Case: Create a Data The Modern Accounting Environment 244
Abstract and Perform Regression
Analysis—Dillard’s 174 The Modern Data Environment 246
The Increasing Importance of the Internal Audit 247
Enterprise Data 248
Chapter 4
Common Data Models 249
Communicating Results and Visualizations 180
Automating Data Analytics 251
Communicating Results 183 Continuous Monitoring Techniques 253
Differentiating between Statistics Alarms and Exceptions 254
and Visualizations 183 Working Papers and Audit Workflow 255
Visualizations Increasingly Preferred over Text 184 Electronic Working Papers and Remote
Determine the Purpose of Your Data Audit Work 255
Visualization 185 Summary 256
Quadrants 1 and 3 versus Quadrants 2 and 4: Key Words 256
Qualitative versus Quantitative 186 Answers to Progress Checks 257
A Special Case of Quantitative Data: Multiple Choice Questions 258
The Normal Distribution 188 Discussion and Analysis 259
Quadrants 1 and 2 versus Quadrants 3 and 4: Problems 259
Declarative versus Exploratory 188 Lab 5-1 Create a Common Data Model—
Choosing the Right Chart 192 Oklahoma 263
Charts Appropriate for Qualitative Data 192 Lab 5-2 Create a Dashboard Based on a Common
Charts Appropriate for Quantitative Data 194 Data Model—Oklahoma 267
Learning to Create a Good Chart by Lab 5-3 Set Up a Cloud Folder and Review
(Bad) Example 195 Changes—Sláinte 272
Further Refining Your Chart to Lab 5-4 Identify Audit Data Requirements—
Communicate Better 200 Sláinte 275
Data Scale and Increments 201 Lab 5-5 Comprehensive Case: Setting Scope—
Color 201 Dillard’s 277

ISTUDY
Contents xxiii

Chapter 6 Chapter 7
Audit Data Analytics 282 Managerial Analytics 334
When to Use Audit Data Analytics 284 Application of the IMPACT Model to Management
Identify the Questions 284 Accounting Questions 336
Master the Data 284 Identify the Questions 336
Perform Test Plan 286 Master the Data 337
Address and Refine Results 288 Perform Test Plan 337
Communicate Insights 288 Address and Refine Results 338
Track Outcomes 288 Communicate Insights and Track Outcomes 339
Descriptive Analytics 288 Identifying Management Accounting Questions 339
Aging of Accounts Receivable 289 Relevant Costs 339
Sorting 289 Key Performance Indicators and Variance Analysis 339
Summary Statistics 289 Cost Behavior 340
Sampling 289 Balanced Scorecard and Key Performance
Diagnostic Analytics 290 Indicators 341
Master the Data and Perform the Test Plan 345
Box Plots and Quartiles 290
Address and Refine Results 347
Z-Score 290
Summary 348
t-Tests 290
Key Words 348
Benford’s Law 292
Answers to Progress Checks 349
Drill-Down 293
Multiple Choice Questions 349
Exact and Fuzzy Matching 293
Discussion and Analysis 351
Sequence Check 294
Problems 351
Stratification and Clustering 294
Lab 7-1 Evaluate Job Costs—Sláinte 355
Advanced Predictive and Prescriptive Lab 7-2 Create a Balanced Scorecard Dashboard—
Analytics in Auditing 294 Sláinte 367
Regression 295 Lab 7-3 Comprehensive Case: Analyze Time
Classification 295 Series Data—Dillard’s 377
Probability 295 Lab 7-4 Comprehensive Case: Comparing Results
Sentiment Analysis 295 to a Prior Period—Dillard’s 389
Applied Statistics 296 Lab 7-5 Comprehensive Case: Advanced
Artificial Intelligence 296 Performance Models—Dillard’s 398
Additional Analyses 296
Summary 297 Chapter 8
Key Words 297 Financial Statement Analytics 404
Answers to Progress Financial Statement Analysis 406
Checks 298 Descriptive Financial Analytics 407
Multiple Choice Questions 298 Vertical and Horizontal Analysis 407
Discussion and Analysis 300 Ratio Analysis 408
Problems 300 Diagnostic Financial Analytics 410
Lab 6-1 Evaluate Trends and Outliers— Predictive Financial Analytics 410
Oklahoma 304 Prescriptive Financial Analytics 412
Lab 6-2 Diagnostic Analytics Using Benford’s Visualizing Financial Data 413
Law—Oklahoma 311 Showing Trends 413
Lab 6-3 Finding Duplicate Payments— Relative Size of Accounts Using
Sláinte 317 Heat Maps 414
Lab 6-4 Comprehensive Case: Sampling— Visualizing Hierarchy 414
Dillard’s 321 Text Mining and Sentiment Analysis 415
Lab 6-5 Comprehensive Case: Outlier Detection— XBRL and Financial Data Quality 417
Dillard’s 325 XBRL Data Quality 419

ISTUDY
xxiv Contents

XBRL, XBRL-GL, and Real-Time Financial Reporting 420 Lab 9-3  omprehensive Case: Calculate Total
C
Examples of Financial Statement Analytics Using Sales Tax Paid—Dillard’s 479
XBRL 422 Lab 9-4 Comprehensive Case: Estimate Sales
Summary 422 Tax Owed by Zip Code—Dillard’s and
Key Words 423 Avalara 486
Answers to Progress Checks 423 Lab 9-5 Comprehensive Case: Online Sales Taxes
Multiple Choice Questions 424 Analysis—Dillard’s and Avalara 492
Discussion and Analysis 425
Problems 426 Chapter 10
Lab 8-1 Create a Horizontal and Vertical Analysis Project Chapter (Basic) 498
Using XBRL Data—S&P100 430
Lab 8-2 Create Dynamic Common Size Financial Evaluating Business Processes 500
Statements—S&P100 437 Question Set 1: Order-to-Cash 500
Lab 8-3 Analyze Financial Statement Ratios— QS1 Part 1 Financial: What Is the Total Revenue
S&P100 441 and Balance in Accounts Receivable? 500
Lab 8-4 Analyze Financial Sentiment— QS1 Part 2 Managerial: How Efficiently Is the Company
S&P100 444 Collecting Cash? 503
QS1 Part 3 Audit: Is the Delivery Process Following the
Chapter 9 Expected Procedure? 504
Tax Analytics 454 QS1 Part 4 What Else Can You Determine about the
Tax Analytics 456 O2C Process? 505
Identify the Questions 456 Question Set 2: Procure-to-Pay 506
Master the Data 456 QS2 Part 1 Financial: Is the Company Missing Out on
Perform Test Plan 456 Discounts by Paying Late? 506
Address and Refine Results 458 QS2 Part 2 Managerial: How Long Is the Company
Communicate Insights and Track Outcomes 458 Taking to Pay Invoices? 509
Mastering the Data through Tax Data QS2 Part 3 Audit: Are There Any Erroneous
Management 458 Payments? 510
Tax Data in the Tax Department 458 QS2 Part 4 What Else Can You Determine about
Tax Data at Accounting Firms 460 the P2P Process? 511
Tax Data at the IRS 461
Tax Data Analytics Visualizations 461 Chapter 11
Tax Data Analytics Visualizations and Tax Project Chapter (Advanced): Analyzing Dillard’s Data
Compliance 461 to Predict Sales Returns 512
Evaluating Sales Tax Liability 462
Estimating Sales Returns 514
Evaluating Income Tax Liability 462
Question Set 1: Descriptive and Exploratory
Tax Data Analytics for Tax Planning 464
Analysis 514
What-If Scenarios 464
QS1 Part 1 Compare the Percentage of Returned Sales
What-If Scenarios for Potential Legislation, Deductions,
and Credits 465 across Months, States, and Online versus In-Person
Summary 467 Transactions 514
Key Words 467 QS1 Part 2 What Else Can You Determine about the
Answers to Progress Checks 467 Percentage of Returned Sales through Descriptive
Multiple Choice Questions 468 Analysis? 518
Discussion and Analysis 469 Question Set 2: Diagnostic Analytics—Hypothesis
Problems 470 Testing 519
Lab 9-1 Descriptive Analytics: State Sales Tax QS2 Part 1 Is the Percentage of Sales Returned Significantly
Rates 472 Higher in January after the Holiday Season? 519
Lab 9-2 Comprehensive Case: Calculate QS2 Part 2 How Do the Percentages of Returned Sales
Estimated State Sales Tax for Holiday/Non-Holiday Differ for Online Transactions
Owed—Dillard’s 475 and across Different States? 521

ISTUDY
Contents xxv

QS2 Part 3 What Else Can You Determine about Appendix D


the Percentage of Returned Sales through Diagnostic SQL Part 1 546
Analysis? 523
Question Set 3: Predictive Analytics 524 Appendix E
QS3 Part 1 By Looking at Line Charts for 2014 SQL Part 2 560
and 2015, Does the Average Percentage of Sales
Returned in 2014 Seem to Be Predictive of Returns Appendix F
in 2015? 524 Power Query in Excel and Power BI 564
QS3 Part 2 Using Regression, Can We Predict Future
Returns as a Percentage of Sales Based on Historical Appendix G
Transactions? 526 Power BI Desktop 572
QS3 Part 3 What Else Can You Determine about
Appendix H
the Percentage of Returned Sales through Predictive
Analysis? 527
Tableau Prep Builder 578

Appendix A Appendix I
Basic Statistics Tutorial 528 Tableau Desktop 582

Appendix B Appendix J
Excel (Formatting, Sorting, Filtering, and Data Dictionaries 586
PivotTables) 534
GLOSSARY 588
Appendix C
Accessing the Excel Data Analysis Toolpak 544 INDEX 593

ISTUDY
ISTUDY
Data Analytics for
Accounting

ISTUDY
Chapter 1
Data Analytics for Accounting and
Identifying the Questions

A Look at This Chapter


Data Analytics is changing both business and accounting. In this chapter, we define Data Analytics and explain its
impact on business and the accounting profession, noting that the value of Data Analytics is derived from the insights
it provides. We also describe the need for an analytics mindset in the accounting profession. We next describe the
Data Analytics Process using the IMPACT cycle and explain how this process is used to address both business and
accounting questions. We then emphasize the skills accountants need as well as the tools available for their use. In
this chapter, we specifically emphasize the importance of identifying appropriate accounting questions that Data Ana-
lytics might be able to address.

A Look Ahead
Chapter 2 provides a description of how data are prepared and scrubbed to be ready for analysis to address account-
ing questions. We explain how to extract, transform, and load data and then how to validate and normalize the
data. In addition, we explain how data standards are used to facilitate the exchange of data between data sender and
receiver. We finalize the chapter by emphasizing the need for ethical data collection and data use to maintain data
privacy.

ISTUDY
As the access to accounting data proliferates and tools and
accountant skills advance, accountants are relying more on Big
Data to address accounting questions. Whether those questions
relate to audit, tax or other accounting areas, increasingly value
will be created by performing Data Analytics. In this chapter, we
introduce you to the need for Data Analytics in accounting, and
how accounting professionals are increasingly asked to develop
an analytics mindset for any and all accounting roles.
Technology such as Data Analytics, artificial intelligence,
machine learning, blockchain, and robotic process automation
will be playing a greater role in the accounting profession this
year, according to a recent report from the Institute of Man-
Cobalt S-Elinoi/Shutterstock
agement Accountants.
The report indicates that finance and accounting professionals are increasingly implementing Big Data in
their business processes, and the pattern is likely to continue in the future. The IMA surveyed its members for
the report and received 170 responses from CFOs and other management accountants. Many of the CFOs are
predicting big changes for 2020 in their businesses.

Sources: M. Cohn, “Accountants to Rely More on Big Data in 2020,” Accounting Today, January 4, 2020, https://www.
accountingtoday.com/news/accountants-to-rely-more-on-big-data-in-2020 (accessed December 2020).

OBJECTIVES
After reading this chapter, you should be able to:

LO 1-1 Define Data Analytics.


LO 1-2 Understand why Data Analytics matters to business.
LO 1-3 Explain why Data Analytics matters to accountants.
LO 1-4 Describe the Data Analytics Process using the IMPACT cycle.
LO 1-5 Describe the skills needed by accountants.
LO 1-6 Explain how the IMPACT model may be used to address a specific
business question.

ISTUDY
4 Chapter 1 Data Analytics for Accounting and Identifying the Questions

LO 1-1 DATA ANALYTICS


Define Data Data surround us! By the year 2024, it is expected that the volume of data created, captured,
Analytics. copied, and consumed worldwide will be 149 zettabytes (compared to 2 zettabytes in 2010
and 59 zettabytes in 2020).1 In fact, more data have been created in the last 2 years than in
the entire previous history of the human race.2 With so much data available about each of
us (e.g., how we shop, what we read, what we’ve bought, what music we listen to, where we
travel, whom we trust, what devices we use, etc.), arguably, there is the potential for analyz-
ing those data in a way that can answer fundamental business questions and create value.
We define Data Analytics as the process of evaluating data with the purpose of drawing
conclusions to address business questions. Indeed, effective Data Analytics provides a way
to search through large structured data (data that adheres to a predefined data model in
a tabular format) and unstructured data (data that does not adhere to a predefined data
format) to discover unknown patterns or relationships.3 In other words, Data Analytics
often involves the technologies, systems, practices, methodologies, databases, statistics,
and applications used to analyze diverse business data to give organizations the informa-
tion they need to make sound and timely business decisions.4 That is, the process of Data
­Analytics aims to transform raw data into knowledge to create value.
Big Data refers to datasets that are too large and complex for businesses’ existing sys-
tems to handle utilizing their traditional capabilities to capture, store, manage, and analyze
these datasets. Another way to describe Big Data (or frankly any available data source)
is by use of four Vs: its volume (the sheer size of the dataset), velocity (the speed of data
processing), variety (the number of types of data), and veracity (the underlying quality of
the data). While sometimes Data Analytics and Big Data are terms used interchangeably, we
will use the term Data Analytics throughout and focus on the possibility of turning data into
knowledge and that knowledge into insights that create value.

PROGRESS CHECK
1. How does having more data around us translate into value for a company? What
must we do with those data to extract value?
2. Banks know a lot about us, but they have traditionally used externally generated
credit scores to assess creditworthiness when deciding whether to extend a
loan. How would you suggest a bank use Data Analytics to get a more complete
view of its customers’ creditworthiness? Assume the bank has access to a cus-
tomer’s loan history, credit card transactions, deposit history, and direct deposit
registration. How could it assess whether a loan might be repaid?

LO 1-2 HOW DATA ANALYTICS AFFECTS BUSINESS


Understand why There is little question that the impact of data and Data Analytics on business is overwhelm-
Data Analytics ing. In fact, in PwC’s 18th Annual Global CEO Survey, 86 percent of chief executive offi-
matters to cers (CEOs) say they find it important to champion digital technologies and emphasize a
business. clear vision of using technology for a competitive advantage, while 85 percent say they put
1
 tatista, https://www.statista.com/statistics/871513/worldwide-data-created/ (accessed December 2020).
S
2
Bernard Marr, “Big Data: 20 Mind-Boggling Facts Everyone Must Read,” Forbes, September 30, 2015,
at http://www.forbes.com/sites/bernardmarr/2015/09/30/big-data-20-mind-boggling-facts-everyone-
must-read/#2a3289006c1d (accessed March 2019).
3
Roger S. Debreceny and Glen L. Gray, “IT Governance and Process Maturity: A Multinational Field
Study,” Journal of Information Systems 27, no. 1 (Spring 2013), pp. 157–88.
4
H. Chen, R. H. L. Chiang, and V. C. Storey, “Business Intelligence Research,” MIS Quarterly 34, no. 1
(2010), pp. 201–3.

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 5

a high value on Data Analytics. In fact, per PwC’s 6th Annual Digital IQ survey of more
than 1,400 leaders from digital businesses, the area of investment that tops CEOs’ list of
priorities is business analytics.5
A recent study from McKinsey Global Institute estimates that Data Analytics and tech-
nology could generate up to $2 trillion in value per year in just a subset of the total pos-
sible industries affected.6 Data Analytics could very much transform the manner in which
companies run their businesses in the near future because the real value of data comes
from Data Analytics. With a wealth of data on their hands, companies use Data Analytics
to discover the various buying patterns of their customers, investigate anomalies that were
not anticipated, forecast future possibilities, and so on. For example, with insight provided
through Data Analytics, companies could execute more directed marketing campaigns
based on patterns observed in their data, giving them a competitive advantage over compa-
nies that do not use this information to improve their marketing strategies. By pairing struc-
tured data with unstructured data, patterns could be discovered that create new meaning,
creating value and competitive advantage. In addition to producing more value externally,
studies show that Data Analytics affects internal processes, improving productivity, utiliza-
tion, and growth.7
And increasingly, data analytic tools are available as self-service analytics allowing users
the capabilities to analyze data by aggregating, filtering, analyzing, enriching, sorting, visual-
izing, and dashboarding for data-driven decision making on demand.
PwC notes that while data has always been important, executives are more frequently
being asked to make data-driven decisions in high-stress and high-change environments,
making the reliance on Data Analytics even greater these days!8

PROGRESS CHECK
3. Let’s assume a brand manager at Procter and Gamble identifies that an older
demographic might be concerned with the use of Tide Pods to do their laundry.
How might Procter and Gamble use Data Analytics to assess if this is a problem?
4. How might Data Analytics assess the decision to either grant overtime to current
employees or hire additional employees? Specifically, consider how Data Ana-
lytics might be helpful in reducing a company’s overtime direct labor costs in a
manufacturing setting.

HOW DATA ANALYTICS AFFECTS ACCOUNTING LO 1-3


Data Analytics is expected to have dramatic effects on auditing and financial reporting as Explain why Data
well as tax and managerial accounting. We detail how we think this might happen in each Analytics matters
of the following sections. to accountants.

5
“ Data Driven: What Students Need to Succeed in a Rapidly Changing Business World,” PwC, https://
www.pwc.com/us/en/faculty-resource/assets/pwc-data-driven-paper-feb2015.pdf, February 2015
(accessed March 20, 2019).
6
“The Trillion-Dollar Opportunity for the Industrial Sector: How to Extract Full Value from Technology,”
McKinsey Global Institute, https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/
the-trillion-dollar-opportunity-for-the-industrial-sector#, November 2018 (accessed December 2018).
7
Joseph Kennedy, “Big Data’s Economic Impact,” https://www.ced.org/blog/entry/big-datas-economic-
impact, December 3, 2014 (accessed January 9, 2016).
8
“What’s Next for Tech for Finance? Data-Driven Decision Making,” PwC, https://www.pwc.com/us/en/cfo-
direct/accounting-podcast/data-driven-decision-making.html, October 2020 (accessed December 2020).

ISTUDY
6 Chapter 1 Data Analytics for Accounting and Identifying the Questions

Auditing
Data Analytics plays an increasingly critical role in the future of audit. In a recent Forbes
Insights/KPMG report, “Audit 2020: A Focus on Change,” the vast majority of survey
respondents believe both that:
1. Audits must better embrace technology.
2. Technology will enhance the quality, transparency, and accuracy of the audit.
Indeed, “As the business landscape for most organizations becomes increasingly com-
plex and fast-paced, there is a movement toward leveraging advanced business analytic
techniques to refine the focus on risk and derive deeper insights into an organization.”9
Many auditors believe that audit data analytics will, in fact, lead to deeper insights that
will enhance audit quality. This sentiment of the impact of Data Analytics on the audit has
been growing for several years now and has given many public accounting firms incentives
to invest in technology and personnel to capture, organize, and analyze financial statement
data to provide enhanced audits, expanded services, and added value to their clients. As a
result, Data Analytics is the next innovation in the evolution of the audit and professional
accounting industry.
Given the fact that operational data abound and are easier to collect and manage, com-
bined with CEOs’ desires to utilize these data, the accounting firms may now approach
their engagements with a different mindset. No longer will they be simply checking for
errors, material misstatements, fraud, and risk in financial statements or merely be report-
ing their findings at the end of the engagement. Instead, audit professionals will now be
collecting and analyzing the company’s data similar to the way a business analyst would to
help management make better business decisions. This means that, in many cases, external
auditors will stay engaged with clients beyond the audit. This is a significant paradigm shift.
The audit process is changing from a traditional process toward a more automated one,
which will allow audit professionals to focus more on the logic and rationale behind data
queries and less on the gathering of the actual data.10 As a result, audits will not only yield
important findings from a financial perspective, but also information that can help compa-
nies refine processes, improve efficiency, and anticipate future problems.

“It’s a massive leap to go from traditional audit approaches to one that fully integrates
big data and analytics in a seamless manner.”11

Data Analytics also expands auditors’ capabilities in services like testing for fraudulent
transactions and automating compliance-monitoring activities (like filing financial reports
to the U.S. Securities and Exchange Commission [SEC] or to the Internal Revenue Service
[IRS]). This is possible because Data Analytics enables auditors to analyze the complete
dataset, rather than the sampling of the financial data done in a traditional audit. Data Ana-
lytics enables auditors to improve its risk assessment in both its substantive and detailed
testing.

9
 eloitte, “Adding Insight to Audit: Transforming Internal Audit through Data Analytics,” http://www2
D
.deloitte.com/content/dam/Deloitte/ca/Documents/audit/ca-en-audit-adding-insight-to-audit.pdf
(accessed January 10, 2016).
10
PwC, “Data Driven: What Students Need to Succeed in a Rapidly Changing Business World,” http://
www.pwc.com/us/en/faculty-resource/assets/PwC-Data-driven-paper-Feb2015.pdf, February 2015
(accessed January 9, 2016).
11
EY, “How Big Data and Analytics Are Transforming the Audit,” https://eyo-iis-pd.ey.com/ARC
/documents/EY-reporting-ssue-9.pdf, posted April 2015. (accessed January 27, 2016).

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 7

We address auditing questions and Data Analytics in Chapters 5 and 6.

Lab Connection
Lab 1-3 has you explore questions auditors would answer with Data Analytics.

Management Accounting
Of all the fields of accounting, it would seem that the aims of Data Analytics are most akin
to management accounting. Management accountants (1) are asked questions by manage-
ment, (2) find data to address those questions, (3) analyze the data, and (4) report the
results to management to aid in their decision making. The description of the management
accountant’s task and that of the data analyst appear to be quite similar, if not identical in
many respects.
Whether it be understanding costs via job order costing, understanding the activity-based
costing drivers, forecasting future sales on which to base budgets, or determining whether
to sell or process further or make or outsource its production processes, analyzing data is
critical to management accountants.
As information providers for the firm, it is imperative for management accountants to
understand the capabilities of data and Data Analytics to address management questions.
We address management accounting questions and Data Analytics in Chapter 7.

Lab Connection
Lab 1-2 and Lab 1-4 have you explore questions managers would answer with
Data Analytics.

Financial Reporting and Financial Statement Analysis


Data Analytics also potentially has an impact on financial reporting. With the use of so
many estimates and valuations in financial accounting, some believe that employing Data
Analytics may substantially improve the quality of the estimates and valuations. Data from
within an enterprise system and external to the company and system might be used to
address many of the questions that face financial reporting. Many financial statement
accounts are just estimates, and so accountants often ask themselves questions like this to
evaluate those estimates:
1. How much of the accounts receivable balance will ultimately be collected? What should
the allowance for loan losses look like?
2. Is any of our inventory obsolete? Should our inventory be valued at market or cost
(applying the lower-of-cost-or-market rule)? When will it be out of date? Do we need to
offer a discount on it now to get it sold?
3. Has our goodwill been impaired due to the reduction in profitability from a recent
merger? Will it regain value in the near future?
4. How should we value contingent liabilities like warranty claims or litigation? Do we
have the right amount?
Data Analytics may also allow an accountant or auditor to assess the probability of a
goodwill write-down, warranty claims, or the collectability of bad debts based on what cus-
tomers, investors, and other stakeholders are saying about the company in blogs and in
social media (like Facebook and Twitter). This information might help the firm determine

ISTUDY
8 Chapter 1 Data Analytics for Accounting and Identifying the Questions

both its optimal response to the situation and appropriate adjustment to its financial
reporting.
It may be possible to use Data Analytics to scan the environment—that is, scan Google
searches and social media (such as Instagram and Facebook) to identify potential risks to
and opportunities for the firm. For example, in a data analytic sense, it may allow a firm to
monitor its competitors and its customers to better understand opportunities and threats
around it. For example, are its competitors, customers, or suppliers facing financial diffi-
culty that might affect the company’s interactions with them and/or open up new opportu-
nities that otherwise it wouldn’t have considered?
We address financial reporting and financial statement analysis questions and Data Ana-
lytics in Chapter 8.

Lab Connection
Lab 1-1 has you explore questions financial accountants would answer with
Data Analytics.

Tax
Traditionally, tax work dealt with compliance issues based on data from transactions that
have already taken place. Now, however, tax executives must develop sophisticated tax
­planning capabilities that assist the company with minimizing its taxes in such a way to
avoid or prepare for a potential audit. This shift in focus makes tax data analytics valuable
for its ability to help tax staffs predict what will happen rather than react to what just did
happen. Arguably, one of the things that Data Analytics does best is predictive analytics—
predicting the future! An example of how tax data analytics might be used is the capability
to predict the potential tax consequences of a potential international transaction, R&D
investment, or proposed merger or acquisition in one of their most value-adding tasks, that
of tax planning!
One of the issues of performing predictive Data Analytics is the efficient organization
and use of data stored across multiple systems on varying platforms that were not originally
designed for use in the tax department. Organizing tax data into a data warehouse to be
able to consistently model and query the data is an important step toward developing the
capability to perform tax data analytics. This issue is exemplified by the 29 percent of tax
departments that find the biggest challenge in executing an analytics strategy is integrating
the strategy with the IT department and gaining access to available technology tools.12
We address tax questions and Data Analytics in Chapter 9.

PROGRESS CHECK
5. Why are management accounting and Data Analytics considered similar in many
respects?
6. How specifically will Data Analytics change the way a tax staff does its taxes?

12
 eloitte, “The Power of Tax Data Analytics,” http://www2.deloitte.com/us/en/pages/tax/articles/top-ten-
D
things-about-tax-data-analytics.html (accessed October 12, 2016).

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 9

THE DATA ANALYTICS PROCESS LO 1-4


USING THE IMPACT CYCLE Describe the Data
Analytics Process
Data Analytics is a process to identify business questions and problems that can be addressed
using the IMPACT
with data. We start to describe our Data Analytics Process by using an established Data cycle.
Analytics model called the IMPACT cycle by Isson and Harriott (as shown in Exhibit 1-1).
We explain the full IMPACT cycle briefly here, but in more detail later in Chapters 2, 3,
and 4. We use its approach for thinking about the steps included in Data Analytics through-
out this textbook, all the way from carefully identifying the question to accessing and ana-
lyzing the data to communicating insights and tracking outcomes.13

Step 1: Identify the Questions (Chapter 1)


It all begins with understanding a business problem that needs addressing. Questions can
arise from many sources, including how to better attract customers, how to price a product,
how to reduce costs, or how to find errors or fraud. Having a concrete, specific question
that is potentially answerable by Data Analytics is an important first step.
Indeed, accountants often possess a unique skillset to improve an organization’s Data
Analytics by their ability to ask the right questions, especially since they often understand a
company’s financial data. In other words, “Your Data Won’t Speak Unless You Ask It the
Right Data Analysis Questions.”14 We could ask any question in the world, but if we don’t
ultimately have the right data to address the question, there really isn’t much use for Data
Analytics for those questions.

Additional attributes to consider might include the following:


• Audience: Who is the audience that will use the results of the analysis (internal auditor,
CFO, financial analyst, tax professional, etc.)?
• Scope: Is the question too narrow or too broad?
• Use: How will the results be used? Is it to identify risks? Is it to make data-driven
­business decisions?

EXHIBIT 1-1
The IMPACT Cycle
Source: Isson, J. P., and J. S.
Harriott. Win with Advanced
Business Analytics: Creating
Business Value from Your
Data. Hoboken, NJ: Wiley,
2013.

13
 e also note our use of the terms IMPACT cycle and IMPACT model interchangeably throughout the
W
book.
14
M. Lebied, “Your Data Won’t Speak Unless You Ask It the Right Data Analysis Questions,” Datapine,
June 21, 2017, https://www.datapine.com/blog/data-analysis-questions/ (accessed December 2020).

ISTUDY
10 Chapter 1 Data Analytics for Accounting and Identifying the Questions

Here is an example of potential questions accountants might address using Data


Analytics:
• Are employees circumventing internal controls over payments?
• What are appropriate cost drivers for activity-based costing purposes?
• To minimize taxes, should we have our company headquarters in Dublin, Ireland, or in
Chicago?
• Are our customers paying us in a timely manner? Are we paying our suppliers in a
timely manner?
• How can we more accurately predict the allowance for loan losses for our bank loans?
• How can we find transactions that are risky in terms of accounting issues?
• Who authorizes checks above $100,000?
• How can errors made in journal entries be identified?
• Should we outsource our products to Indonesia, or produce them ourselves?

Step 2: Master the Data (Chapter 2)


Mastering the data requires one to know what data are available and whether those data
might be able to help address the business problem. We need to know everything about the
data, including how to access, availability, reliability (if there are errors or missing data),
frequency of updates, what time periods are covered to make sure the data coincide with the
timing of our business problem, and so on.
In addition, to give us some idea of the data questions, we may want to consider the
following:
• Review data availability in a firm’s internal systems (including those in the financial
reporting system or enterprise systems that might occur in its accounting processes—
financial, procure-to-pay, production, order-to-cash, human resources).
• Review data availability in a firm’s external network, including those that might already
be housed in an existing data warehouse.
• Examine data dictionaries and other contextual data—to provide details about the data.
• Evaluate and perform the ETL (extraction, transformation, and loading) processes and
assess the time required to complete.
• Assess data validation and completeness—to provide a sense of the reliability of the
data.
• Evaluate and perform data normalization—to reduce data redundancy and improve data
integrity.
• Evaluate and perform data preparation and scrubbing—Data Analytics professionals
estimate that they spend between 50 and 90 percent of their time cleaning data so the
data can be analyzed.15

Step 3: Perform Test Plan (Chapter 3)


After mastering the data and after the data are ready (in step 2), we are prepared for analy-
sis. With the data ready for analysis, we need to think of the right approach to the data to
be able to answer the question.
In Data Analytics, we work to extract knowledge from the data to address questions
and problems. Using all available data, we see if we can identify a relationship between
the response (or dependent) variables and those items that affect the response (also called

15
“ One-Third of BI Pros Spend up to 90% of Time Cleaning Data,” http://www.eweek.com/database/one-
third-of-bi-pros-spend-up-to-90-of-time-cleaning-data.html, posted June 2015 (accessed March 15,
2016).

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 11

predictor, explanatory, or independent variables). To do so, we’ll generally make a model, or


a simplified representation of reality, to address this purpose.
An example might be helpful here. Let’s say we are trying to predict each of your class-
mates’ performance on their next intermediate accounting exam. The response or depend-
ent variable will be the score on the next exam. What helps predict the performance of each
exam will be our predictor, explanatory, or independent variables. Variables such as study
time, score on last exam, IQ, and standardized test scores (ACT, SAT, etc.), as well as stu-
dent enjoyment of accounting, might all be considered. Perhaps given your experience, you
can name other predictor variables to include in our model predicting exam performance.
The research question, the model, the data availability, and the expected statistical infer-
ence may all suggest the use of different data approaches. Provost and Fawcett16 detail eight
different approaches to Data Analytics depending on the question. We will discuss the most
applicable ones to accounting more formally in Chapter 3 and highlight accounting ques-
tions that they might address. The eight different approaches include the following:
• Classification—An attempt to assign each unit (or individual) in a population into a
few categories. An example of classification might be, of all the loans this bank has
offered, which are most likely to default? Or which loan applications are expected to be
approved? Or which transactions would a credit card company flag as potentially being
fraudulent and deny payment? Which companies are most likely to go bankrupt in the
next two years?
• Regression—A data approach used to predict a specific dependent variable value based
on independent variable inputs using a statistical model. Regression analysis might be
used to assess the relationship between an investment in R&D and subsequent operat-
ing income. Another example would be the use of regression to identify an appropriate
cost driver to allocate overhead as part of activity-based costing.
• Similarity matching—An attempt to identify similar individuals based on data known
about them. A company may use similarity matching to find new customers that may
closely resemble their best customers (in hopes that they find additional profitable
customers).
• Clustering—An attempt to divide individuals (like customers) into groups (or clusters)
in a useful or meaningful way. In other words, identifying groups of similar data ele-
ments and the underlying drivers of those groups. For example, clustering might be
used to segment loyalty card customers into groups based on buying behavior related
to shopping frequency or purchasing volume, for additional analysis and marketing
activities.
• Co-occurrence grouping—An attempt to discover associations between individuals based
on transactions involving them. Amazon might use this to sell another item to you
by knowing what items are “frequently bought together” or “Customers who bought
this item also bought. . . .” Exhibit 1-2 shows us an Amazon search for the Yamaha
MG10XU stereo mixer provides several related item suggestions to the customer.
• Profiling—An attempt to characterize the “typical” behavior of an individual, group, or
population by generating summary statistics about the data (including mean, median,
minimum, maximum, and standard deviation). By understanding the typical behavior,
we’ll be able to more easily identify abnormal behavior. When behavior departs from
that typical behavior—which we’ll call an anomaly—then further investigation is war-
ranted. Profiling might be used in accounting to identify fraud or just those transactions
that might warrant some additional investigation (e.g., travel expenses that are three
standard deviations above the norm).
16
Foster Provost and Tom Fawcett, Data Science for Business: What You Need to Know about Data Min-
ing and Data-Analytic Thinking (Sebastopol, CA: O’Reilly Media, 2013).

ISTUDY
12 Chapter 1 Data Analytics for Accounting and Identifying the Questions

EXHIBIT 1-2
Example of Co-
occurrence Grouping
on Amazon.com
Amazon Inc.

• Link prediction—An attempt to predict connections between two data items. This might
be used in social media. For example, because an individual might have 22 mutual
Facebook friends with me and we both attended Brigham Young University in the
same year, is there a chance we would like to be Facebook friends as well? Exhibit 1-3
provides an example of this used in Facebook. Link prediction in an accounting setting
might work to use social media to look for relationships between related parties that are
not otherwise disclosed.

EXHIBIT 1-3
Example of Link Pre-
diction on Facebook
Michael DeLeon/Getty
Images; Sam Edwards/Glow
Images; Daniel Ernst/Getty
Images; Exactostock/Super-
Stock; McGraw HIll

• Data reduction—A data approach that attempts to reduce the amount of information that
needs to be considered to focus on the most critical items (e.g., highest cost, highest
risk, largest impact, etc.). It does this by taking a large set of data (perhaps the popula-
tion) and reducing it with a smaller set that has the vast majority of the critical informa-
tion of the larger set. An example might include the potential to use these techniques in

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 13

auditing. While auditing has employed various random and stratified sampling over the
years, Data Analytics suggests new ways to highlight which transactions do not need the
same level of additional vetting (such as substantive testing) as other transactions.

Step 4: Address and Refine Results (Chapter 3)


After the data have been analyzed (in step 3 of the IMPACT cycle), the fourth step is to
address and refine results. Data analysis is iterative. We slice, dice, and manipulate the data;
find correlations; test hypotheses; ask ourselves further, hopefully better questions; ask col-
leagues what they think; and revise and rerun the analysis potentially multiple times. But
once that is complete, we have the results ready to communicate to interested stakeholders
that hopefully directly addresses their questions.

Steps 5 and 6: Communicate Insights and Track


Outcomes (Chapter 4 and each chapter thereafter)
Once the results have been determined (in step 4 of the IMPACT cycle), insights are formed
by decision makers and are communicated (the “C” in the IMPACT cycle) and some out-
comes will be continuously tracked (the “T” in the IMPACT cycle).
Chapter 4 discusses ways to communicate results, including the use of executive summa-
ries, static reports, digital dashboards, and data visualizations. Data Analytics is especially
interested in reporting results that help decision makers see the data in an all-new way to
develop insights that help answer business questions, recognizing that different users con-
sume deliverables in a potentially different way. Increasingly, digital dashboards and data
visualizations are particularly helpful in communicating insights and tracking outcomes.

Back to Step 1
Since the IMPACT cycle is iterative, once insights are gained and outcomes are tracked,
new more refined questions emerge that may use the same or different data sources with
potentially different analyses and thus, the IMPACT cycle begins anew.

PROGRESS CHECK
7. Let’s say we are trying to predict how much money college students spend on
fast food each week. What would be the response, or dependent, variable? What
would be examples of independent variables?
8. How might a data reduction approach be used in auditing to allow the auditor to
spend more time and effort on the most important (e.g., most risky, largest dollar
volume, etc.) items?

DATA ANALYTIC SKILLS AND TOOLS NEEDED LO 1-5


BY ANALYTIC-MINDED ACCOUNTANTS Describe the skills
While we don’t believe that accountants need to become data scientists—they may never and tools needed
need to build a database from scratch or perform the real, hardcore Data Analytics—they by accountants.
must know how to do the following:
• Clearly articulate the business problem the company is facing.
• Communicate with the data scientists about specific data needs and understand the
underlying quality of the data.

ISTUDY
14 Chapter 1 Data Analytics for Accounting and Identifying the Questions

• Draw appropriate conclusions to the business problem based on the data and make rec-
ommendations on a timely basis.
• Present their results to individual members of management (CEOs, audit managers,
etc.) in an accessible manner to each member.
Consistent with that, in this text we emphasize skills that analytic-minded accountants
should have in the following seven areas:
1. Developed analytics mindset—know when and how Data Analytics can address business
questions.
2. Data scrubbing and data preparation—comprehend the process needed to clean and
prepare the data before analysis.
3. Data quality—recognize what is meant by data quality, be it completeness, reliability, or
validity.
4. Descriptive data analysis—perform basic analysis to understand the quality of the under-
lying data and its ability to address the business question.
5. Data analysis through data manipulation—demonstrate ability to sort, rearrange, merge,
and reconfigure data in a manner that allows enhanced analysis. This may include diag-
nostic, predictive, or prescriptive analytics to appropriately analyze the data.
6. Statistical data analysis competency—identify and implement an approach that will use
statistical data analysis to draw conclusions and make recommendations on a timely basis.
7. Data visualization and data reporting—report results of analysis in an accessible way to
each varied decision maker and his or her specific needs.
We address these seven skills throughout the first four chapters in the text in hopes that
the analytic-minded accountant will develop and practice these skills to be ready to address
business questions. We then demonstrate these skills in the labs and hands-on analysis
throughout the rest of the book.

Data Analytics at Work

What Does a Data Analyst Do at a Big Four Accounting Firm?


Data Sources: We extract financial data from a number of different ERP
­systems including SAP, Abacus, Sage, and Microsoft Navision (among others).
Data Scrubbing and Data Preparation: A huge part of our time goes into
data cleaning and data transformation.
Tools Used: Excel, Unix commands, SQL, and Python are used to automate
large chunks of our work.
Knowledge Needed: Basic Excel, programming skills (SQL, Python), and audit
knowledge such as understanding journal entries and trial balances are needed.

Source: “Data Analyst at a Big 4—What Is It Like? My Opinion Working as a Data Analyst at a
Big Four,” https://cryptobulls.info/data-analyst-at-a-big-4-what-is-it-like-pros-cons-ernst-young-
deloitte-pwc-kpmg, posted February 29, 2020 (accessed January 2, 2021).

Choose the Right Data Analytics Tools


In addition to developing the right skills, it is also important to be familiar with the right
Data Analytics tools for each task. There are many tools available for Data Analytics

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 15

preparation, modeling, and visualization. Gartner annually assesses a collection of these


tools and creates the “magic quadrant” for business intelligence, depicted in Exhibit 1-4.
The magic quadrant can provide insight into which tools you should consider using.

EXHIBIT 1-4
Gartner Magic Quad-
rant for Business Intel-
ligence and Analytics
Platforms
Source: Sallam, R. L., C.
Howson, C. J. Idoine, T. W.
Oestreich, J. L. Richardson,
and J. Tapadinhas, “Magic
Quadrant for Business Intel-
ligence and Analytics Plat-
forms,” Gartner RAS Core
Research Notes, Gartner,
Stamford, CT (2020).

Based on Gartner’s magic quadrant, it is easy to see that Tableau and Microsoft provide
innovative solutions. While there are other tools that are popular in different industries,
such as Qlik and TIBCO, Tableau and Microsoft tools are the ones you will most likely
encounter because of their position as leaders in the Data Analytics space. For this reason,
each of the labs throughout this textbook will give you or your instructor the option to
choose either a Microsoft Track or a Tableau Track to help you become proficient in those
tools. The skills you learn as you work through the labs are transferrable to other tools as
well.

The Microsoft Track


Microsoft’s offerings for Data Analytics and business intelligence (BI) include Excel, Power
Query, Power BI, and Power Automate. It is likely that you already have some familiarity
with Excel as it is used for everything from recording transactions to running calculations
and preparing financial reports. These tools are summarized in Exhibit 1-5.
Excel is the most ubiquitous spreadsheet software and most commonly used for basic
data analysis. It allows for the creation of tables, advanced formulas to do quick or com-
plex calculations, and the ability to create PivotTables and basic charts and graphs. One
major issue with using Excel for analysis of large datasets is the 1,048,576 row limit due to
memory constraints. It is available on Windows and Mac as well as through the Microsoft

ISTUDY
16 Chapter 1 Data Analytics for Accounting and Identifying the Questions

EXHIBIT 1-5 Tool Excel Power Query Power BI Power Automate


Microsoft Data Good for Small datasets Large datasets Large datasets Collect data from
­Analytics Tools Data tables Data joining Advanced multiple sources
PivotTables Data cleaning visualization Robotics process
Basic visualization Data Dashboards automation
transformation Presentation
Platform Windows/Mac/Online Windows Windows/Online Online

365 online service for simple collaboration and sharing, although the most complete set of
features and compatibility with advanced plug-ins is available only on Windows.
Power Query is a tool built into Excel and Power BI Desktop on Windows that lets Excel
connect to a variety of different data sources, such as tables in Excel, databases hosted on
popular platforms like SQL Server, or through open database connectivity (ODBC) con-
nections. Power Query makes it possible to connect, manipulate, clean, and join data so you
can pull them into your Excel sheet or use them in Power BI to create summary reports and
advanced visualizations. Additionally, it tracks each step you perform so you can apply the
same transformations to new data without recreating the work from scratch.
Power BI is an analytic platform that enables generation of simple or advanced Data
Analytics models and visualizations that can be compiled into dashboards for easy sharing
with relevant stakeholders. It builds on data from Excel or other databases and can lever-
age models created with Power Query to quickly summarize key data findings. Microsoft
provides Power BI Desktop for free only on Windows or through a web-based app, though
the online version does not have all of the features of the desktop version and is primarily
used for sharing.
Power Automate is a tool that leverages robotics process automation (RPA) to automate
routine tasks and workflows, such as scraping and collecting data from nonstructured
sources, including emails and other online services. These can pull data from relevant
sources based on events, such as when an invoice is generated. Power Automate is a
­web-based subscription service with a tool that works only on Windows to automate key-
strokes and mouse clicks.

The Tableau Track


In previous years, Tableau was ranked slightly higher than Microsoft on its ability to exe-
cute and it continues to be a popular choice for analytics professionals. Tableau’s primary
offerings include Tableau Prep for data preparation and Tableau Desktop for data visualiza-
tion and storytelling. Tableau has an advantage over Microsoft in that its tools are available
for both Windows and Mac computers. Additionally, Tableau offers online services through
Tableau Server and Tableau Online with the same, complete feature set as their apps. These
are summarized in Exhibit 1-6.

EXHIBIT 1-6 Tool Tableau Prep Builder Tableau Desktop Tableau Public
Tableau Data Analytics
Good for Large datasets Large datasets Analyze and share
Tools Data summarization Advanced visualization public datasets
Data joining Dashboards
Data cleaning Presentation
Data transformation
Platform Windows/Mac/Online Windows/Mac/Online Windows/Mac/Online

Tableau Prep is primarily used for data combination, cleaning, manipulation, and
insights. It enables users to interact with data and quickly identify data quality issues with

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 17

a clear map of steps performed so others can review the cleaning process. It is available on
Windows, Mac, and Tableau Online.
Tableau Desktop can be used to generate basic to advanced Data Analytics models and
visualizations with an easy-to-use drag-and-drop interface.
Tableau Public is a free limited edition of Tableau Desktop that is specifically tailored
to sharing and analyzing public datasets. It has some significant limitations for broader
analysis.

PROGRESS CHECK
9. Given the “magic quadrant” in Exhibit 1-4, why are the software tools repre-
sented by the Microsoft and Tableau tracks considered innovative?
10. Why is having the Tableau software tools fully available on both Windows and
Mac computers an advantage for Tableau over Microsoft?

HANDS-ON EXAMPLE OF THE IMPACT MODEL LO 1-6


Here we provide a complete, hands-on example of the IMPACT model to show how it could Explain how the
be implemented for a specific situation. IMPACT model
Let’s suppose I am trying to get a loan to pay off some credit card debt and my friend may be used to
has told me about a new source of funds that doesn’t involve a bank. In recent years, facili- address a specific
tated by the Internet, peer-to-peer lenders allow individuals to both borrow and lend money business question.
to each other. While there are other peer-to-peer lenders, in this case, we will specifically
consider the LendingClub.
My question is whether I will be able to get a loan given my prior loan history (poor),
credit score, and the like. According to our approaches mentioned, this would be an exam-
ple of a classification approach because we are attempting to predict whether a person
applying for a loan will be approved and funded or whether she will be denied a loan.

Identify the Questions


Stated specifically, our question is, “What are some characteristics of rejected loans?”

Master the Data


LendingClub is a U.S.-based, peer-to-peer lending company, headquartered in San Fran-
cisco, California. LendingClub facilitates both borrowing and lending by providing a
platform for unsecured personal loans between $1,000 and $35,000. The loan period
is for either 3 or 5 years. There is information available that allows potential investors
to search and browse the loan listings on the LendingClub website and select loans in
which they would like to invest. The available information includes information supplied
about the borrower, amount of the loan, loan grade (and related loan interest rate), and
loan purpose. Investors invest in the loans and make money from interest. LendingClub
makes money by charging borrowers an origination fee and investors a service fee. Since
2007, hundreds of thousands of borrowers have obtained more than $60 billion in loans
via LendingClub.17
Some basic lending statistics are included on the LendingClub Statistics website
(Exhibit 1-7). Each bar represents the volume of loans each quarter during its respective year.

17
https://www.lendingclub.com/ (accessed September 29, 2016).

ISTUDY
18 Chapter 1 Data Analytics for Accounting and Identifying the Questions

EXHIBIT 1-7 Total loans issuance


LendingClub Statistics
2013 $60,188,236,052
Source: Accessed December
2020, https://www.lendingclub.
in loans issued as of 09/30/20
com/info/statistics.action
2014

2015

2016

2017

2018

2019

2020

$0 $58 $108 $158 $208 $258 $308 $358 $408 $458 $508 $558 $608
Total loans issued ($)

Borrowers borrow money for a variety of reasons, including refinancing other debt and
paying off credit cards, as well as borrowing for other purposes (Exhibit 1-8).

EXHIBIT 1-8 Credit Card Payoff


LendingClub Statistics (13.04%)
by Reported Loan
Purpose
42.33% of LendingClub Refinancing
borrowers report using (42.33%)
their loans to refinance
existing loans as of
September 30, 2020.
Source: Accessed December
2020, https://www.lendingclub.
com/info/statistics.action Other
(44.63%)

LendingClub provides datasets on the loans it approved and funded as well as data for
the loans that were declined. To address the question posed, “What are some characteristics
of rejected loans?,” we’ll use the dataset of rejected loans.
The rejected loan datasets and related data dictionary are available from your instructor
or from Connect (in Additional Student Resources).

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 19

As we learn about the data, it is important to know what is available to us. To that end,
there is a data dictionary that provides descriptions for all of the data attributes of the data-
set. A cut-out of the data dictionary for the rejected stats file (i.e., the statistics about those
loans rejected) is shown in Exhibit 1-9.

EXHIBIT 1-9
2007–2012 Lending-
Club Data Dictionary
for Declined Loan Data
Source: LendingClub

We could also take a look at the data files available for the funded loan data. However,
for our analysis in the rest of this chapter, we use the Excel file “DAA Chapter 1-1 Data”
that has rejected loan statistics from LendingClub for the time period of 2007 to 2012. It is
a cleaned-up, transformed file ready for analysis. We’ll learn more about data scrubbing and
preparation of the data in Chapter 2.
Exhibit 1-10 provides a cut-out of the 2007–2012 “Declined Loan” dataset provided.

EXHIBIT 1-10
2007–2012 Declined
Loan Applications
(DAA Chapter 1-1
Data) Dataset
Microsoft Excel, 2016

ISTUDY
20 Chapter 1 Data Analytics for Accounting and Identifying the Questions

Perform Test Plan


Considering our question, “What are the characteristics of rejected loans (at ­LendingClub)?,”
and the available data, we will do three analyses to assess what is considered in rejecting/
accepting a loan, including:
1. The debt-to-income ratios and number of rejected loans.
2. The length of employment and number of rejected loans.
3. The credit (or risk) score and number of rejected loans.
Because LendingClub collects these three loan characteristics, we believe it will provide
LendingClub with the data needed to assess whether the potential borrower will be able to
pay back the loan, and give us an idea if our loan will be approved or rejected.
The first analysis we perform considers the debt-to-income ratio of the potential borrower.
That is, before adding this potential loan, how big is the potential borrower’s debt compared
to the size of the potential borrower’s annual income?
To consider the debt-to-income ratio in our analysis, three buckets (labeled DTI bucket)
are constructed for each grouping of the debt-to-income ratio. These three buckets include
the following:
1. High (debt is greater than 20 percent of income).
2. Medium (“Mid”) (debt is between 10 and 20 percent of income).
3. Low (debt is less than 10 percent of income).
Once those buckets are constructed, we are ready to analyze the breakdown of rejected
loan applications by the debt-to-income ratio.
The Excel PivotTable is an easy way to make comparisons between the different levels of
DTI. When we run a PivotTable analysis, we highlight the loans, which count the number
of loans applied for and rejected, and the DTI bucket (see Exhibit 1-11). The PivotTable
counts the number of loan applications in each of the three DTI buckets: high, medium
(mid), and low. This suggests that because the high DTI bucket has the highest number of
loan applications, perhaps the applicant asked for a loan that was too big given his or her
income. LendingClub might have seen that as too big of a risk and chosen to not extend the
loan to the borrower using the debt-to-income ratio as an indicator.
The second analysis is on the length of employment and its relationship with rejected
loans (see Exhibit 1-12). Arguably, the longer the employment, the more stable of a job and
income stream you will have to ultimately repay the loan. LendingClub reports the number
of years of employment for each of the rejected applications. The PivotTable analysis lists
the number of loans by the length of employment. Almost 77 percent (495,109 out of
645,414) out of the total rejected loans had worked at a job for less than 1 year, suggesting
potentially an important reason for rejecting the requested loan. Perhaps some had worked
a week, or just a month, and still want a big loan?
The third analysis we perform is to consider the credit or risk score of the applicant. As
noted in Exhibit 1-13, risk scores are typically classified in this way with those in the excel-
lent and very good category receiving the lowest possible interest rates and best terms with
a credit score above 750. On the other end of the spectrum are those with very bad credit
(with a credit score less than 600).
We will classify the sample according to this breakdown into excellent, very good, good,
fair, poor, and very bad credit according to their credit score noted in Exhibit 1-13.
As part of the analysis of credit score and rejected loans, we again perform PivotTable anal-
ysis (as seen in Exhibit 1-14) by counting the number of rejected loan applications by credit
(risk) score. We’ll note in the rejected loans that nearly 82 percent [(167,379 + 151,716 +
207,234)/645,414] of the applicants have either very bad, poor, or fair credit ratings, sug-
gesting this might be a good reason for a loan rejection. We also note that only 0.3 percent

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 21

EXHIBIT 1-11
LendingClub Declined
Loan Applications by
DTI (Debt-to-Income)
DTI bucket includes
high (debt > 20 percent
of income), medium
(“mid”) (debt between
10 and 20 percent
of income), and low
(debt < 10 percent of
income).
Microsoft Excel, 2016

EXHIBIT 1-12
LendingClub Declined
Loan Applications by
Employment Length
(Years of Experience)
Microsoft Excel, 2016

ISTUDY
22 Chapter 1 Data Analytics for Accounting and Identifying the Questions

EXHIBIT 1-13
Breakdown of Customer Excellent
Credit Scores (or Risk Those with excellent and 800–850
Scores) very good credit scores are
likely to qualify for almost
Source: Cafecredit.com
all loans and receive the
lowest interest rates. Very Good
750–799

Good
Those with good and fair 700–749
credit scores are likely to
qualify for most loans and
receive good interest rates. Fair
650–699

Poor
Those with poor and
600–649
very bad credit scores
are likely to qualify for
loans only if they have
Very Bad
sufficient collateral.
300–599

EXHIBIT 1-14
The Count of Lending-
Club Rejected Loan
Applications by Credit
or Risk Score Classifi-
cation Using PivotTable
Analysis
(PivotTable shown
here required manually
sorting rows to get in
proper order.)
Microsoft Excel, 2016

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 23

Address and Refine Results


Now that we have completed the basic analysis, we can refine our analysis for greater
insights. An example of this more refined analysis might be a further investigation of the
rejected loans. For example, if these are the applications that were all rejected, the question
is how many of these that might apply for a loan not only had excellent credit, but also had
worked more than 10 years and had asked for a loan that was less than 10 percent of their
income (in the low DTI bucket)? Use of a PivotTable (as shown in Exhibit 1-15) allows
us to consider this three-way interaction and provides an answer of 365 out of 645,414
(0.057 percent of the total). This might suggest that the use of these three metrics is rea-
sonable at predicting loan rejection because the number who have excellent credit, worked
more than 10 years, and requested a loan that was less than 10 percent of their income was
such a small percentage of the total.
Perhaps those with excellent credit just asked for too big of a loan given their existing
debt and that is why they were rejected. Exhibit 1-16 shows the PivotTable analysis. The
analysis shows those with excellent credit asked for a larger loan (16.2 percent of income)

EXHIBIT 1-15
The Count of
­LendingClub Declined
Loan Applications by
Credit (or Risk Score),
Debt-to-Income (DTI
Bucket), and Employ-
ment Length Using
PivotTable Analysis
(Highlighting Added)
Microsoft Excel, 2016

ISTUDY
24 Chapter 1 Data Analytics for Accounting and Identifying the Questions

EXHIBIT 1-16
The Average Debt-to-
Income Ratio (Shown
as a Percentage) by
Credit (Risk) Score for
LendingClub Declined
Loan Applications
Using PivotTable
Analysis
Microsoft Excel, 2016

given the debt they already had as compared to any of the others, suggesting a reason even
those potential borrowers with excellent credit were rejected.

Communicate Insights
Certainly further and more sophisticated analysis could be performed, but at this point we
have a pretty good idea of what LendingClub uses to decide whether to extend or reject a
loan to a potential borrower. We can communicate these insights either by showing the
PivotTables or simply stating what three of the determinants are. What is the most effective
communication? Just showing the PivotTables themselves, showing a graph of the results,
or simply sharing the names of these three determinants to the decision makers? Knowing
the decision makers and how they like to receive this information will help the analyst deter-
mine how to communicate insights.

Track Outcomes
There are a wide variety of outcomes that could be tracked. But in this case, it might be best
to see if we could predict future outcomes. For example, the data we analyzed were from
2007 to 2012. We could make our predictions for subsequent years based on what we had
found in the past and then test to see how accurate we are with those predictions. We could
also change our prediction model when we learn new insights and additional data become
available.

ISTUDY
Chapter 1 Data Analytics for Accounting and Identifying the Questions 25

PROGRESS CHECK
11. Lenders often use the data item of whether a potential borrower rents or owns
their house. Beyond the three characteristics of rejected loans analyzed in this
section, do you believe this data item would be an important determinant of
rejected loans? Defend your answer.
12. Performing your own analysis, download the rejected loans dataset titled “DAA
Chapter 1-1 Data” and perform an Excel PivotTable analysis by state (including
the District of Columbia) and figure out the number of rejected applications for
the state of California. That is, count the loans by state and see what percentage
of the rejected loans came from California. How close is that to the relative pro-
portion of the population of California as compared to that of the United States?
13. Performing your own analysis, download the rejected loans dataset titled “DAA
Chapter 1-1 Data” and run an Excel PivotTable by risk (or credit) score classifica-
tion and DTI bucket to determine the number of (or percentage of) rejected loans
requested by those rated as having an excellent credit score.

Summary
In this chapter, we discussed how businesses and accountants derive value from Data Ana-
lytics. We gave some specific examples of how Data Analytics is used in business, auditing,
managerial accounting, financial accounting, and tax accounting.
We introduced the IMPACT model and explained how it is used to address accounting
questions. And then we talked specifically about the importance of identifying the question.
We walked through the first few steps of the IMPACT model and introduced eight data
approaches that might be used to address different accounting questions. We also discussed
the data analytic skills needed by analytic-minded accountants.
We followed this up using a hands-on example of the IMPACT model, namely what are
the characteristics of rejected loans at LendingClub. We performed this analysis using vari-
ous filtering and PivotTable tasks.
■ With data all around us, businesses and accountants are looking at Data Analytics to
extract the value that the data might possess. (LO 1-1, 1-2, 1-3)
■ Data Analytics is changing the audit and the way that accountants look for risk. Now,
auditors can consider 100 percent of the transactions in their audit testing. It is also help-
ful in finding anomalous or unusual transactions. Data Analytics is also changing the way
financial accounting, managerial accounting, and taxes are done at a company. (LO 1-3)
■ The IMPACT cycle is a means of performing Data Analytics that goes all the way from iden-
tifying the question, to mastering the data, to performing data analyses and communicating
and tracking results. It is recursive in nature, suggesting that as questions are addressed,
new, more refined questions may emerge that can be addressed in a similar way. (LO 1-4)
■ Eight data approaches address different ways of testing the data: classification, regres-
sion, similarity matching, clustering, co-occurrence grouping, profiling, link prediction,
and data reduction. These are explained in more detail in Chapter 3. (LO 1-4)
■ Data analytic skills needed by analytic-minded accountants are specified and are consis-
tent with the IMPACT cycle, including the following: (LO 1-5)
◦ Developed analytics mindset.
◦ Data scrubbing and data preparation.

ISTUDY
◦ Data quality.
◦ Descriptive data analysis.
◦ Data analysis through data manipulation.
◦ Statistical data analysis competency.
◦ Data visualization and data reporting.
■ We showed an example of the IMPACT cycle using LendingClub data regarding rejected
loans to illustrate the steps of the IMPACT cycle. (LO 1-6)

Key Words
Big Data (4) Datasets that are too large and complex for businesses’ existing systems to handle utilizing
their traditional capabilities to capture, store, manage, and analyze these datasets.
classification (11) A data approach that attempts to assign each unit in a population into a few catego-
ries potentially to help with predictions.
clustering (11) A data approach that attempts to divide individuals (like customers) into groups (or
clusters) in a useful or meaningful way.
co-occurrence grouping (11) A data approach that attempts to discover associations between indi-
viduals based on transactions involving them.
Data Analytics (4) The process of evaluating data with the purpose of drawing conclusions to address
business questions. Indeed, effective Data Analytics provides a way to search through large structured and
unstructured data to identify unknown patterns or relationships.
data dictionary (19) Centralized repository of descriptions for all of the data attributes of the dataset.
data reduction (12) A data approach that attempts to reduce the amount of information that needs to
be considered to focus on the most critical items (i.e., highest cost, highest risk, largest impact, etc.).
link prediction (12) A data approach that attempts to predict a relationship between two data items.
predictor (or independent or explanatory) variable (11) A variable that predicts or explains
another variable, typically called a predictor or independent variable.
profiling (11) A data approach that attempts to characterize the “typical” behavior of an i­ndividual,
group, or population by generating summary statistics about the data (including mean, standard
deviations, etc.).
regression (11) A data approach that attempts to estimate or predict, for each unit, the numerical
value of some variable using some type of statistical model.
response (or dependent) variable (10) A variable that responds to, or is dependent on, another.
similarity matching (11) A data approach that attempts to identify similar individuals based on data
known about them.
structured data (4) Data that are organized and reside in a fixed field with a record or a file. Such
data are generally contained in a relational database or spreadsheet and are readily searchable by search
algorithms.
unstructured data (4) Data that do not adhere to a predefined data model in a tabular format.

ANSWERS TO PROGRESS CHECKS


1. The plethora of data alone does not necessarily translate into value. However, if we care-
fully analyze the data to help address critical business problems and questions, the data
have the potential to create value.

26

ISTUDY
2. Banks frequently use credit scores from outside sources like Experian, TransUnion, and
Equifax to evaluate creditworthiness of its customers. However, if they have access to
all of their customer’s banking information, Data Analytics would allow them to evalu-
ate their customers’ creditworthiness. Banks would know how much money they have
and how they spend it. Banks would know if they had prior loans and if they were paid
in a timely manner. Banks would know where they work and the size and stability of
monthly income via the direct deposits. All of these combined, in addition to a credit
score, might be used to assess creditworthiness if customers desire a loan. It might also
give banks needed information for a marketing campaign to target potential creditwor-
thy customers.
3. The brand manager at Procter and Gamble might use Data Analytics to see what is
being said about Procter and Gamble’s Tide Pods product on social media websites
(e.g., ­Snapchat, Twitter, Instagram, and Facebook), particularly those that attract an older
­demographic. This will help the manager assess if there is a problem with the perceptions
of its laundry detergent products.
4. Data Analytics might be used to collect information on the amount of overtime. Who
worked overtime? What were they working on? Do we actually need more full-time
employees to reduce the level of overtime (and its related costs to the company and to
the employees)? Would it be cost-effective to just hire full-time employees instead of pay-
ing overtime? How much will costs increase just to pay for fringe benefits (health care,
retirement, etc.) for new employees versus just paying existing employees for their over-
time. All of these questions could be addressed by analyzing recent records explaining
the use of overtime.
5. Management accounting and Data Analytics both (1) address questions asked by man-
agement, (2) find data to address those questions, (3) analyze the data, and (4) report the
results to management. In all material respects, management accounting and Data Analyt-
ics are similar, if not identical.
6. The tax staff would become much more adept at efficiently organizing data from multiple
systems across an organization and performing Data Analytics to help with tax planning to
structure transactions in a way that might minimize taxes.
7. The dependent variable could be the amount of money spent on fast food. Indepen-
dent variables could be proximity of the fast food, ability to cook own food, discretionary
income, socioeconomic status, and so on.
8. The data reduction approach might help auditors spend more time and effort on the most
risky transactions or on those that might be anomalous in nature. This will help them more
efficiently spend their time on items that may well be of highest importance.
9. According to the “magic quadrant,” the software tools represented by the Microsoft and
Tableau Tracks are considered innovative because they lead the market in the “ability to
execute” and “completeness of vision” dimensions.
10. Having Tableau software tools available on both the Mac and Windows computers gives
the analyst needed flexibility that is not available for the Microsoft Track, which are fully
available only on Windows computers.
11. The use of the data item whether a potential borrower owns or rents their house would
be expected to complement the risk score, debt levels (DTI bucket), and length of employ-
ment, since it can give a potential lender additional data on the financial position and
financial obligations (mortgage or rent payments) of the borrower.
12. An analysis of the rejected loans suggests that 85,793 of the total 645,414 rejected loans
were from the state of California. That represents 13.29 percent of the total rejected
loans. This is greater than the relative population of California to the United States as of
the 2010 census, of 12.1 percent (37,253,956/308,745,538).
13. A PivotTable analysis of the rejected loans suggests that more than 30.6 percent
(762/2,494) of those in the excellent risk credit score range asked for a loan with a debt-
to-income ratio of more than 20 percent.

ISTUDY
Microsoft Excel, 2016

Microsoft Excel, 2016

Multiple Choice Questions


®

1. (LO 1-1) Big Data is often described by the four Vs, or


a. volume, velocity, veracity, and variability.
b. volume, velocity, veracity, and variety.
c. volume, volatility, veracity, and variability.
d. variability, velocity, veracity, and variety.
2. (LO 1-4) Which data approach attempts to assign each unit in a population into a small
set of classes (or groups) where the unit best fits?
a. Regression
b. Similarity matching
c. Co-occurrence grouping
d. Classification

28

ISTUDY
3. (LO 1-4) Which data approach attempts to identify similar individuals based on data
known about them?
a. Classification
b. Regression
c. Similarity matching
d. Data reduction
4. (LO 1-4) Which data approach attempts to predict connections between two data items?
a. Profiling
b. Classification
c. Link prediction
d. Regression
5. (LO 1-6) Which of these terms is defined as being a central repository of descriptions
for all of the data attributes of the dataset?
a. Big Data
b. Data warehouse
c. Data dictionary
d. Data Analytics
6. (LO 1-5) Which skills were not emphasized that analytic-minded accountants should
have?
a. Developed an analytics mindset
b. Data scrubbing and data preparation
c. Classification of test approaches
d. Statistical data analysis competency
7. (LO 1-5) In which areas were skills not emphasized for analytic-minded accountants?
a. Data quality
b. Descriptive data analysis
c. Data visualization and data reporting
d. Data and systems analysis and design
8. (LO 1-4) The IMPACT cycle includes all except the following steps:
a. perform test plan.
b. visualize the data.
c. master the data.
d. track outcomes.
9. (LO 1-4) The IMPACT cycle specifically includes all except the following steps:
a. data preparation.
b. communicate insights.
c. address and refine results.
d. perform test plan.
10. (LO 1-1) By the year 2024, the volume of data created, captured, copied, and c­ onsumed
worldwide will be 149 _____.
a. zettabytes
b. petabytes
c. exabytes
d. yottabytes

ISTUDY
Discussion and Analysis
®

1. (LO 1-1) The opening article “Accountants to Rely More on Big Data in 2020” sug-
gested that Data Analytics would be increasingly implementing Big Data in their busi-
ness processes. Why is that? How can Data Analytics help accountants do their jobs?
2. (LO 1-1) Define Data Analytics and explain how a university might use its techniques to
recruit and attract potential students.
3. (LO 1-2) Give a specific example of how Data Analytics creates value for businesses.
4. (LO 1-3) Give a specific example of how Data Analytics creates value for auditing.
5. (LO 1-3) How might Data Analytics be used in financial reporting? And how might it be
used in doing tax planning?
6. (LO 1-3) How is the role of management accounting similar to the role of the data
analyst?
7. (LO 1-4) Describe the IMPACT cycle. Why does its order of the processes and its recur-
sive nature make sense?
8. (LO 1-4) Why is identifying the question such a critical first step in the IMPACT process
cycle?
9. (LO 1-4) What is included in mastering the data as part of the IMPACT cycle described
in the chapter?
10. (LO 1-4) What data approach mentioned in the chapter might be used by Facebook to
find friends?
11. (LO 1-4) Auditors will frequently use the data reduction approach when considering
potentially risky transactions. Provide an example of why focusing on a portion of the
total number of transactions might be important for auditors to assess risk.
12. (LO 1-4) Which data approach might be used to assess the appropriate level of the
allowance for doubtful accounts?
13. (LO 1-6) Why might the debt-to-income attribute included in the declined loans dataset
considered in the chapter be a predictor of declined loans? How about the credit (risk)
score?
14. (LO 1-6) To address the question “Will I receive a loan from LendingClub?” we had
available data to assess the relationship among (1) the debt-to-income ratios and num-
ber of rejected loans, (2) the length of employment and number of rejected loans, and
(3) the credit (or risk) score and number of rejected loans. What additional data would
you recommend to further assess whether a loan would be offered? Why would they be
helpful?

Problems
®

1. (LO 1-4) Match each specific Data Analytics test to a specific test approach, as part of
performing a test plan:
• Classification
• Regression
• Similarity Matching
• Clustering
• Co-occurrence Grouping
• Profiling
• Link Prediction
• Data Reduction

30

ISTUDY
Test
Specific Data Analytics Test Approach
1. Predict which firms will go bankrupt and which firms will not go
bankrupt.
2. Use stratified sampling to focus audit effort on transactions with
greatest risk.
3. Work to understand normal behavior, to then be able to identify
abnormal behavior (such as fraud).
4. Look for relationships between related parties that are not
otherwise disclosed.
5. Predict which new customers resemble the company’s best
customers.
6. Predict the relationship between an investment in advertising
expenditures and subsequent operating income.
7. Segment all of the company’s customers into groups that will allow
further specific analysis.
8. The customers who buy product X will be most likely to be also
interested in product Y.

2. (LO 1-4) Match each of the specific Data Analytics tasks to the stage of the IMPACT
cycle:
• Identify the Questions
• Master the Data
• Perform Test Plan
• Address and Refine Results
• Communicate Insights
• Track Outcomes

Stage of IMPACT
Specific Data Analytics Test Cycle
1. Should we use company-specific data or macro-economic data to ad-
dress the accounting question?
2. What are appropriate cost drivers for activity-based costing purposes?
3. Should we consider using regression analysis or clustering analysis to
evaluate the data?
4. Should we use tables or graphs to show management what we’ve
found?
5. Now that we’ve evaluated the data one way, should we perform an-
other analysis to gain additional insights?
6. What type of dashboard should we use to get the latest, up-to-date
results?

3. (LO 1-5) Match the specific analysis need/characteristic to the appropriate Microsoft
Track software tool:
• Excel
• Power Query
• Power BI
• Power Automate

ISTUDY
Specific Analysis Need/Characteristic Microsoft Track Tool
1. Basic visualization
2. Robotics process automation
3. Data joining
4. Advanced visualization
5. Works on Windows/Mac/Online platforms
6. Dashboards
7. Collect data from multiple sources
8. Data cleaning

4. (LO 1-5) Match the specific analysis need/characteristic to the appropriate Tableau
Track software tool:
• Tableau Prep Builder
• Tableau Desktop
• Tableau Public

Specific Analysis Need/Characteristic Tableau Track Tool


1. Advanced visualization
2. Analyze and share public datasets
3. Data joining
4. Presentations
5. Data transformation
6. Dashboards
7. Data cleaning

5. (LO 1-6) Navigate to the Connect Additional Student Resources page. Under Chapter 1
Data Files, download and consider the LendingClub data dictionary file “LCDataDiction-
ary,” specifically the LoanStats tab. This represents the data dictionary for the loans that
were funded. Choose among these attributes in the data dictionary and indicate which
are likely to be predictive that loans will go delinquent, or that loans will ultimately be
fully repaid and which are not predictive.
Predictive Attributes Predictive? (Yes/No)
1. date (Date when the borrower accepted the offer)
2. desc (Loan description provided by borrower)
3. dti (A ratio of debt owed to income earned)
4. grade (LC assigned loan grade)
5. home_ownership (Values include Rent, Own, Mortgage, Other)
6. loanAmnt (Amount of the loan)
7. next_pymnt_d (Next scheduled payment date)
8. term (Number of payments on the loan)
9. tot_cur_bal (Total current balance of all accounts)

6. (LO 1-6) Navigate to the Connect Additional Student Resources page. Under
­Chapter 1 Data Files, download and consider the rejected loans dataset of Lending-
Club data titled “DAA Chapter 1-1 Data.” Choose among these attributes in the data
­dictionary, and indicate which are likely to be predictive of loan rejection, and which
are not.

32

ISTUDY
Predictive Attributes Predictive? (Yes/No)
1. Amount Requested
2. Zip Code
3. Loan Title
4. Debt-To-Income Ratio
5. Application Date
6. Risk_Score
7. Employment Length

7. (LO 1-6) Navigate to the Connect Additional Student Resources page. Under Chapter 1
Data Files, download and consider the rejected loans dataset of LendingClub data titled
“DAA Chapter 1-1 Data” from the Connect website and perform an Excel P ­ ivotTable by
state; then figure out the number of rejected applications for the state of Arkansas. That
is, count the loans by state and compute the percentage of the total rejected loans
in the United States that came from Arkansas. How close is that to the relative pro-
portion of the population of Arkansas as compared to the overall U.S. population (per
2010 census)? Use your browser to find the population of Arkansas and the United
States and calculate the relative percentage and answer the following questions.
7A. Multiple Choice: What is the percentage of total loans rejected in the United
States that came from Arkansas?
a. Less than 1%.
b. Between 1% and 2%.
c. More than 2%.
7B. Multiple Choice: Is this loan rejection percentage greater than the percentage of
the U.S. population that lives in Arkansas (per 2010 census)?
a. Loan rejection percentage is greater than the population.
b. Loan rejection percentage is less than the population.
8. (LO 1-6) Download the rejected loans dataset of LendingClub data titled “DAA
­Chapter 1-1 Data” from Connect Additional Student Resources and do an Excel
­PivotTable by state; then figure out the number of rejected applications for each state.
8A. Put the following states in order of their loan rejection percentage based on the
count of rejected loans (from high [1] to low [11]) of the total rejected loans. Does
each state’s loan rejection percentage roughly correspond to its relative propor-
tion of the U.S. population?

State Rank 1 (High) to 11 (Low)


1. Arkansas (AR)
2. Hawaii (HI)
3. Kansas (KS)
4. New Hampshire (NH)
5. New Mexico (NM)
6. Nevada (NV)
7. Oklahoma (OK)
8. Oregon (OR)
9. Rhode Island (RI)
10. Utah (UT)
11. West Virginia (WV)

ISTUDY
8B. What is the state with the highest percentage of rejected loans?
8C. What is the state with the lowest percentage of rejected loans?
8D. Analysis: Does each state’s loan rejection percentage roughly correspond
to its ­relative proportion of the U.S. population (by 2010 U.S. census at https://
en.wikipedia.org/wiki/2010_United_States_census)?
For Problems 9, 10, and 11, we will be cleaning a data file in preparation for subse-
quent analysis.
The analysis performed on LendingClub data in the chapter (“DAA Chapter 1-1 Data”)
was for the years 2007–2012. For this and subsequent problems, please download the
rejected loans table for 2013 from Connect Additional Student Resources titled “DAA
Chapter Data 1-2.”
9. (LO 1-6) Consider the 2013 rejected loan data from LendingClub titled “DAA
­Chapter 1-2 Data” from Connect Additional Student Resources. Browse the file in Excel
to ensure there are no missing data. Because our analysis requires risk scores, debt-to-
income data, and employment length, we need to make sure each of them has valid
data. There should be 669,993 observations.
a. Assign each risk score to a risk score bucket similar to the chapter. That is, classify
the sample according to this breakdown into excellent, very good, good, fair, poor,
and very bad credit according to their credit score noted in Exhibit 1-13. Classify
those with a score greater than 850 as “Excellent.” Consider using nested if–then
statements to complete this. Or sort by risk score and manually input into appropri-
ate risk score buckets.
b. Run a PivotTable analysis that shows the number of loans in each risk score bucket.

Which risk score bucket had the most rejected loans (most observations)? Which
risk score bucket had the least rejected loans (least observations)? Is it similar to
Exhibit 1-14 performed on years 2007–2012?
10. (LO 1-6) Consider the 2013 rejected loan data from LendingClub titled “DAA
­Chapter 1-2 Data.” Browse the file in Excel to ensure there are no missing data. Because
our analysis requires risk scores, debt-to-income data, and employment length, we need
to make sure each of them has valid data. There should be 669,993 observations.
a. Assign each valid debt-to-income ratio into three buckets (labeled DTI bucket) by classi-
fying each debt-to-income ratio into high (>20.0 percent), medium (10.0–20.0 percent),
and low (<10.0 percent) buckets. Consider using nested if–then statements to com-
plete this. Or sort the row and manually input.
b. Run a PivotTable analysis that shows the number of loans in each DTI bucket.

Which DTI bucket had the highest and lowest grouping for this rejected Loans dataset?
Any interpretation of why these loans were rejected based on debt-to-income ratios?
11. (LO 1-6) Consider the 2013 rejected loan data from LendingClub titled “DAA
­Chapter 1-2 Data.” Browse the file in Excel to ensure there are no missing data.
Because our analysis requires risk scores, debt-to-income data, and employment
length, we need to make sure each of them has valid data. There should be 669,993
observations.
a. Assign each risk score to a risk score bucket similar to the chapter. That is, classify
the sample according to this breakdown into excellent, very good, good, fair, poor,
and very bad credit according to their credit score noted in Chapter 1. Classify those
with a score greater than 850 as “Excellent.” Consider using nested if-then state-
ments to complete this. Or sort by risk score and manually input into appropriate risk
score buckets (similar to Problem 9).
b. Assign each debt-to-income ratio into three buckets (labeled DTI bucket) by classify-
ing each debt-to-income ratio into high (>20.0 percent), medium (10.0–20.0 percent),

34

ISTUDY
and low (<10.0 percent) buckets. Consider using nested if-then statements to com-
plete this. Or sort the row and manually classify into the appropriate bucket.
c. Run a PivotTable analysis to show the number of excellent risk scores but high DTI
bucket loans in each employment year bucket.
Which employment length group had the most observations to go along with excel-
lent risk scores but high debt-to-income? Which employment year group had the least
observations to go along with excellent risk scores but high debt-to-income? Analysis:
Any interpretation of why these loans were rejected?

ISTUDY
LABS ®

Lab 1-0 How to Complete Labs


The labs in this book will provide valuable hands-on experience in generating and analyz-
ing accounting problems. Each lab will provide a company summary with relevant facts,
techniques that you will use to complete your analysis, software that you’ll need, and an
overview of the lab steps.
When you’ve completed your lab, your instructor may ask you to submit a screenshot
lab document showing screenshots of work you have completed at various points in the lab.
This lab will demonstrate how to create a lab document for submission.

Lab 1-0 Part 1 Explore Different Tool Tracks


When completing labs throughout this textbook, you may be given the option to complete
one or more tracks. Depending on the software your instructor chooses to emphasize, you
may see instructions for one or more of the following tracks:
Microsoft Track: Lab instructions for Microsoft tools, including Excel, Power Query,
and Power BI, will appear in a green box like this:

Microsoft | Excel

1. Open Excel and create a new blank workbook.


2. . . .

Tableau Track: Lab instructions for Tableau tools, including Tableau Prep and Tableau
Desktop, will appear in a blue box like this:

Tableau | Desktop

1. Open Tableau Desktop and create a new workbook.


2. . . .

Throughout the lab you will be asked to answer questions about the process and the
results. Add your screenshots to your screenshot lab document. All objective and analysis
questions should be answered in Connect or included in your screenshot lab document,
depending on your instructor’s preferences.

Lab 1-0 Part 1 Objective Questions (LO 1-1, 1-5)


OQ1. According to your instructor, which track(s) will you be completing this semes-
ter? (Answer this in Connect or write your response in your lab document.)
OQ2. Where should you answer objective lab questions? (Answer this in Connect or
write your response in your lab document.)

36

ISTUDY
Lab 1-0 Part 1 Analysis Questions (LO 1-1, 1-5)
AQ1. What is the purpose of taking screenshots of your progress through the labs?
(Answer this in Connect or write your response in your lab document.)

Lab 1-0 Part 2 Take Screenshots of Your Tools


This part will make sure that you are able to locate and open the software needed for future
labs and take screenshots of your progress through the labs. Before you begin the lab, you
should create a new blank Word document where you will record your screenshots and
responses and save it as 1-0 Lab Document [Your name] [Your email].docx. Note that
­anytime you see the camera icon you should capture the current state of your own work
on your computer screen. If you don’t know how to capture screenshots, see the instruc-
tions included in the boxes below. Once you have completed the lab and collected your
screenshots and answers, you may be asked to submit your screenshot lab document to your
instructor for grading.

Microsoft | Excel + Power Query, Power BI Desktop

1. If you haven’t already, download and install the latest version of Excel and
Power BI Desktop on your Windows computer or log on to the remote
desktop.
a. To install Excel, if your university provides Microsoft Office, go to portal.
office.com and click Install Office.
b. To install Power BI Desktop, search for Power BI Desktop in the Micro-
soft Store and click Install.
c. To access both Excel and Power BI Desktop on the remote desktop, go to
waltonlab.uark.edu and log in with the username and password provided
by your instructor.
2. Open Excel and create a new blank workbook.
3. From the ribbon, click Data> Get Data > Launch Power Query Editor. A
blank window will appear.
4. Take a screenshot (label it 1-0MA) of the Power Query Editor window and
paste it into your lab document.
a. To take a screenshot in Windows:
1. Open the Start menu and search for “Snipping Tool” or “Snip &
Sketch”.
2. Click New (Rectangular Snip) and draw a rectangle across your
screen that includes your entire window. A preview window with your
screenshot will appear.
3. Press Ctrl + C to copy your screenshot.
4. Go to your lab document and press Ctrl + V to paste the screenshot
into your document.
b. To take a screenshot on a Mac:
1. Press Cmd + Shift + 4 and draw a rectangle across your screen that
includes your entire window. Your screenshot will be saved in your
Desktop folder.

ISTUDY
2. Navigate to your Desktop folder and drag the screenshot file into
your lab document.
5. Close the Power Query Editor and close your Excel workbook.
6. Open Power BI Desktop and close the welcome screen.
7. Take a screenshot (label it 1-0MB) of the Power BI Desktop workspace
and paste it into your lab document.
8. Close Power BI Desktop.

Tableau | Prep, Desktop

1. If you haven’t already, download and install the latest version of Tableau Prep
and Tableau Desktop on your computer or log on to the remote desktop.
a. To install Tableau Prep and Tableau Desktop, go to tableau.com
/academic/students and click Get Tableau for Free. Complete the form,
then download and run the installers for both applications. Be sure to
register using your school email address (ending in .edu)—this will help
ensure that your application for a student license will be approved.
b. To access both Tableau Prep and Tableau Desktop on a remote desktop,
go to waltonlab.uark.edu and log in with the username and password
provided by your instructor.
2. Open Tableau Prep and open a sample flow.
3. Take a screenshot (label it 1-0TA) of the blank Tableau Prep window and
paste it into your lab document.
a. To take a screenshot in Windows:
1. Open the Start menu and search for “Snipping Tool” or “Snip &
Sketch”.
2. Click New (Rectangular Snip) and draw a rectangle across your
screen that includes your entire window. A preview window with your
screenshot will appear.
3. Press Ctrl + C to copy your screenshot.
4. Go to your lab document and press Ctrl + V to paste the screenshot
into your document.
b. To take a screenshot on a Mac:
1. Press Cmd + Shift + 4 and draw a rectangle across your screen that
includes your entire window. Your screenshot will be saved in your
Desktop folder.
2. Navigate to your Desktop folder and drag the screenshot file into
your lab document.
4. Close Tableau Prep.
5. Open Tableau Desktop and create a new workbook.
6. Choose a sample workbook from the selection screen or press the Esc key.
7. Take a screenshot (label it 1-0TB) of the blank Tableau Desktop workbook
and paste it into your lab document.
8. Close Tableau Desktop.

38

ISTUDY
Lab 1-0 Part 2 Objective Questions (LO 1-1, 1-5)
OQ1. Where did you go to complete this lab activity? (Answer this in Connect or write
your response in your lab document.)
OQ2. What type of computer operating system do you normally use? (Answer this in
Connect or write your response in your lab document.)

Lab 1-0 Part 2 Analysis Questions (LO 1-1, 1-5)


AQ3. Compare and Contrast: If you completed both tracks in this lab, which tool are
you most interested in learning and why? (This question does not appear in Con-
nect. Write your response in your lab document.)

Lab 1-0 Submit Your Screenshot Lab Document


Verify that you have captured all of your required screenshots and have answered any ques-
tions your instructor has assigned, then upload your screenshot lab document to Connect
or the location indicated by your instructor.

Lab 1-1 Data Analytics Questions in Financial Accounting


Case Summary: Let’s see how we might perform some simple Data Analytics. The pur-
pose of this lab is to help you identify relevant questions that may be answered using Data
Analytics.
You were just hired as an analyst for a credit rating agency that evaluates publicly listed
companies in the United States. The agency already has some Data Analytics tools that it
uses to evaluate financial statements and determine which companies have higher risk and
which companies are growing quickly. The agency uses these analytics to provide ratings
that will allow lenders to set interest rates and determine whether to lend money in the first
place. As a new analyst, you’re determined to make a good first impression.

Lab 1-1 Part 1 Identify the Questions


Think about ways that you might analyze data from a financial statement. You could use
a horizontal analysis to view trends over time, a vertical analysis to show account propor-
tions, or ratios to analyze relationships. Before you begin the lab, you should create a new
blank Word document where you will record your screenshot and save it as Lab 1-1 [Your
name] [Your email address].docx.

Lab 1-1 Part 1 Analysis Questions


AQ1. Use what you know about financial statement analysis (or search the web if
you need a refresher) to generate three different metrics for evaluating financial
performance. For example, if you wanted to evaluate a company’s profit margin
from one year to the next your question might be, “Has Apple Inc’s gross margin
increased in the last 3 years?”
AQ2. Next to each question generate a hypothetical answer to the question to help
you identify what your expected output would be. You may use some insight or
intuition or search for industry averages to inform your hypothesis. For example:
“Hypothesis: Apple Inc’s gross margin has increased slightly in the past 3 years.”
AQ3. Evaluate each question from Part 1. There are specific data attributes that
will help you find the answer you’re looking for. For example, if your question
was “Has [Company X’s] gross margin increased in the last 3 years?” and the

ISTUDY
expected answer is “Apple Inc’s gross margin has increased slightly in the past 3
years,” this tells you what attributes (or fields) to look for: company name, gross
margin (sales revenues – cost of goods sold), year.

Lab 1-1 Part 2 Master the Data


To answer your questions, you’ll need to evaluate specific account values or financial state-
ment paragraphs. As an analyst, you have access to the Securities and Exchange Commis-
sion’s (SEC’s) EDGAR database of XBRL financial statements as well as a list of XBRL
tags from the Financial Accounting Standards Board (FASB). XBRL stands for eXtensible
Business Reporting Language and is used to make the data in financial statements machine-
readable. Public companies have been preparing XBRL reports since 2008. While there are
some issues with XBRL data, such data have become a useful means for comparing and
analyzing financial statements. Every value, date, and paragraph is “tagged” with a label
that identifies what each specific value represents, similar to assigning attributes in a data-
base. Because companies tag their financial statements with XBRL tags, you can use those
tags to identify specific data that you need to answer your questions.
For example:
Company name = EntitySectorIndustryClassificationPrimary
Gross margin = GrossProfit
Sales revenues = SalesRevenueNet
Cost of goods sold = CostOfGoodsAndServicesSold
Year = DocumentPeriodEndDate
Identify XBRL tags from the FASB’s taxonomy:
1. Open a web browser, and go to xbrlview.fasb.org.
2. Click the + next to US GAAP (2021-01-31).
3. Click the ALL (Main/Entire) option, and then click Open to load the taxonomy.
4. Navigate through the financial statements to determine which accounts you need
to answer your questions from Part 1. The name of the XBRL tag is found in the
properties pane next to “Name.” For example, the tag for Total Assets can be found
by expanding 104000 Statement of Financial Position, Classified > Statement of
Financial Position [Abstract] > Statement [Table] > Statement [Line Items] > Assets
[Abstract] > Assets, Total. You may also use the search function.
Note: Be careful when you use the search function. The tag you see in the results may
appear in the wrong statement. Double-click the tag to expand the tree and show
where the account appears.
5. Click the Assets, Total element to load the data in the Details tab and scroll to the bot-
tom to locate the tag Name in the Properties panel.
6. Take a screenshot (label it 1-1A) of the Total Assets tag information in the XBRL
taxonomy.

Lab 1-1 Part 2 Analysis Questions


AQ1. For each of your questions, identify the account or data attribute you need to
answer your question. Then use FASB’s XBRL taxonomy to identify the specific
XBRL tags that represent those accounts.

Lab 1-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

40

ISTUDY
Lab 1-2 Data Analytics Questions in Managerial Accounting
Case Summary: Each day as you work in your company’s credit department, you must eval-
uate the credit worthiness of new and existing customers. As you observe the credit applica-
tion process, you wonder if there might be an opportunity to look at data from consumer
lending to see if you can help improve your company’s process. You are asked to evaluate
LendingClub, a U.S.-based, peer-to-peer lending company, headquartered in San Francisco,
California. LendingClub facilitates both borrowing and lending by providing a platform for
unsecured personal loans between $1,000 and $35,000. The loan period is for either 3 or 5
years. You should begin by identifying appropriate questions and developing a hypothesis
for each question. Then, using publicly available data, you should identify data fields and
values that could help answer your questions.

Lab 1-2 Part 1 Identify the Questions


Your company currently collects information about your customers when they apply for
credit and evaluates the credit limit and utilization, or how much credit is currently being
used, to determine whether a new customer should be extended credit. You think there
might be a way to provide better input for that credit decision.
When you evaluate the criteria LendingClub lenders use to evaluate loan applications,
you notice that the company uses credit application data to assign a risk score to all loan
applicants. This risk score is used to help lenders determine (1) whether a loan is likely to
be repaid and (2) what interest rate approved loans will receive. The risk score is calculated
using a number of inputs, such as employment and payment history. You have been asked
to consider if there may be better inputs to evaluate this given that the number of written-off
accounts has increased in the past 2 years. Using available data, you would like to propose
a model that would help create a risk score that you could apply to your own company’s
customers.

Lab 1-2 Part 1 Analysis Questions (LO 1-3, 1-4)


AQ1. Use what you know about loan risk (or search the web if you need a refresher)
to identify three different questions that might influence risk. For example, if
you suspect risky customers live in a certain location, your question might be
“Where do the customers with highest risk live?”
AQ2. For each question you identified in AQ1, generate a hypothetical answer to each
question to help you identify what your expected output would be. You may use
some insight or intuition or search the Internet for ideas on how to inform your
hypothesis. For example: “Hypothesis: High-risk customers likely live in coastal
towns.”
AQ3. Finally, identify the data that you would need to answer each of your questions.
For example, to determine customer location, you might need the city, state,
and zip code. Additionally, if you hypothesize a specific region, you’d need to
know which cities, states, and/or zip codes belong to that region.

Lab 1-2 Part 2 Master the Data


Now that you have an idea of what questions would help influence your risk model and the
types of data that you need to collect, it is time to evaluate the specific data that ­LendingClub
collects using a listing of data attributes that it collects in Table 1-2A. Look through the list
of attributes and review the description of each, thinking about how these might influence
a risk score.

ISTUDY
LAB TABLE 1-2A Attribute Description
Names and Descrip-
id Loan identification number
tions of Selected Data
Attributes Collected by member_id Membership identification number
LendingClub loan_amnt Requested loan amount
emp_length Employment length
issue_d Date of loan issue
loan_status Fully paid or charged off
pymnt_plan Payment plan: yes or no
purpose Loan purpose: e.g., wedding, medical, debt_consolidation, car
zip_code Zip code
addr_state State
dti Debt-to-income ratio
delinq_2y Late payments within the past 2 years
earliest_cr_line Oldest credit account
inq_last_6mnths Credit inquiries in the past 6 months
open_acc Number of open credit accounts
revol_bal Total balance of all credit accounts
revol_util Percentage of available credit in use
total_acc Total number of credit accounts
application_type Individual or joint application

Lab 1-2 Part 2 Analysis Questions (LO 1-3, 1-4)


AQ1. Evaluate each of your questions from Part 1. Do the data you identified in your
questions exist in the table provided? If so, write the applicable fields next to
each question in your document.
AQ2. Are there data values you identified in Part 1 that don’t exist in the table?
Explain how you might collect the missing data or where you might locate it.

Lab 1-2 Submit Your Screenshot Lab Document


No screenshots are required for this lab.

Lab 1-3 Data Analytics Questions in Auditing


Case Summary: ABC Company is a large retailer that collects its order-to-cash data
in a large ERP system that was recently updated to comply with the AICPA’s audit
data standards. ABC Company currently collects all relevant data in the ERP system
and digitizes any contracts, orders, or receipts that are completed on paper. The credit
department reviews customers who request credit. Sales orders are approved by manag-
ers before being sent to the warehouse for preparation and shipment. Cash receipts are
collected by a cashier and applied to a customer’s outstanding balance by an accounts
receivable clerk.
You have been assigned to the audit team that will perform the internal controls audit
of ABC Company. In this lab, you should identify appropriate questions and develop a
hypothesis for each question. Then you should translate questions into target fields and
value in a database and perform a simple analysis.

42

ISTUDY
Lab 1-3 Part 1 Identify the Questions
Your audit team has been tasked with identifying potential internal control weaknesses
within the order-to-cash process. You have been asked to consider what the risk of internal
control weakness might look like and how the data might help identify it.
Before you begin the lab, you should create a new blank Word document where you will
record your screenshot and save it as Lab 1-3 [Your name] [Your email address].docx.

Lab 1-3 Part 1 Analysis Questions (LO 1-3, 1-4)


AQ1. Use what you know about internal controls over the order-to-cash process (or
search the web if you need a refresher) to identify three different questions that
might indicate internal control weakness. For example, if you suspect that a
manager may be delaying approval of shipments sent to customers, your ques-
tion might be “Are any shipping managers approving shipments more than
2 days after they are received?”
AQ2. Next to each question generate a hypothetical answer to help you identify what
your expected output would be. You may use some insight or intuition or search
the Internet for ideas on how to inform your hypothesis. For example: “Hypoth-
esis: Only one or two shipping managers are approving shipments more than
2 days after they are received.”
AQ3. Finally, identify the data that you would need to answer each of your questions.
For example, to determine the timing of approval and who is involved, you
might need the approver ID, the order date, and the approval date.

Lab 1-3 Part 2 Master the Data


To answer your questions, you’ll need to evaluate the data that are available. As a starting
point, you should look at attributes listed in the AICPA’s audit data standards. The AICPA
set these standards to map common data elements that should be accessible in any modern
enterprise system and make it possible for auditors to create a common set of analytic mod-
els and tools. To access the audit data standards, complete the following steps:
1. Open your web browser and search for “Audit data standards order to cash.” Follow
the link to the “Audit Data Standards Library—AICPA,” then look for the “Audit Data
Standard—Order to Cash Subledger Standard” PDF document.
2. Quickly scroll through the document and evaluate the tables (e.g., Sales_Orders_
YYYYMMDD_YYYYMMDD), field names (e.g., Sales_Order_ID), and descriptions
(e.g., “Unique identifier for each sales order.”).
3. Take a screenshot (label it 1-3A) of the page showing 2.1
Sales_Orders_YYYYMMDD_YYYYMMDD.
4. As you skim the tables, make note of any data elements you identified in Part 1 that
don’t appear in the list of fields in the audit data standard.

Lab 1-3 Part 2 Analysis Questions (LO 1-3, 1-4)


AQ1. List some of the tables and fields from the audit data standard that relate to
each question you identified in Part 1. For example, if you’re looking for the
shipment timing and approval data, you would need the Shipments_Made_
YYYYMMDD_YYYYMMDD table and Approved_By, Entered_Date, and
Approved_Date fields.
AQ2. Are there data values you identified in Part 1 that don’t exist in the tables?
Explain how you might collect the missing data or where you might locate them.

ISTUDY
Lab 1-3 Submit Your Screenshot Lab Document
Verify that you have captured your required screenshot and have answered any questions
your instructor has assigned, then upload your screenshot lab document to Connect or the
location indicated by your instructor.

Lab 1-4 C
 omprehensive Case: Questions about Dillard’s
Store Data
Case Summary: Dillard’s is a department store with approximately 330 stores in 29 states in
the United States. Its headquarters is located in Little Rock, Arkansas. You can learn more
about Dillard’s by looking at finance.yahoo.com (ticker symbol = DDS) and the Wikipedia
site for DDS. You’ll quickly note that William T. Dillard II is an accounting grad of the
University of Arkansas and the Walton College of Business, which may be why he shared
transaction data with us to make available for this lab and labs throughout this text. In this
lab, you will identify appropriate questions for a retailer. Then, translate questions into tar-
get tables, fields, and values in the Dillard’s database.

Lab 1-4 Part 1 Identify the Questions


From the Walton College website, we note the following:

The Dillard’s Department Store Database contains retail sales information gathered from
store sales transactions. The sale process begins when a customer brings items intended
for purchase (clothing, jewelry, home décor, etc.) to any store register. A Dillard’s sales
associate scans the individual items to be purchased with a barcode reader. This popu-
lates the transaction table (TRANSACT), which will later be used to generate a sales
receipt listing the item, department, and cost information (related price, sale price, etc.)
for the customer. When the customer provides payment for the items, payment details
are recorded in the transaction table, the receipt is printed, and the transaction is com-
plete. Other tables are used to store information about stores, products, and departments.
Source: http://walton.uark.edu/enterprise/dillardshome.php (accessed July 15, 2021).

This is a gifted dataset that is based on real operational data. Like any real database,
integrity problems may be noted. This can provide a unique opportunity not only to be
exposed to real data, but also to illustrate the effects of data integrity problems.
For this lab, you should rely on your creativity and prior business knowledge to answer
the following analysis questions. Answer these questions in your lab doc or in Connect and
then continue to the next part of this lab.

Lab 1-4 Part 1 Analysis Questions (LO 1-1, 1-3, 1-4)


AQ1. Assume that Dillard’s management is interested in improving profitability. Write
three questions that could be asked to assess current profitability levels for each
product and how profitability could be improved in the near future.
AQ2. Assume that Dillard’s management wishes to improve its online sales and
profitability on those sales. What three questions could be asked to see where
­Dillard’s stands on its online sales?

Lab 1-4 Part 2 Master the Data


An analysis of data related to Dillard’s retail sales can provide some of the answers to your
questions from Part 1. Consider the attributes that are given in Lab Exhibit 1-4.

44

ISTUDY
CUSTOMER table: LAB EXHIBIT 1-4
Attribute Description Sample values Dillard’s Sales Trans-
action Tables and
CUST_ID Unique identifier representing a cus- 219948527, 219930818
tomer instance
Attributes

CITY City where the customer lives. HOUSTON, COOS BAY


STATE State where the customer lives. FL, TX
ZIP_CODE Customer’s 5-digit zip code. 72701, 84770
ZIP_SECSEG Customer’s geographic segment code 5052, 6474
DISTANCE_TO_NEAREST_STORE Miles from the customer’s house to the 0.687, 6.149
closest Dillard’s store.
PREFERRED_STORE Dillard’s store number the customer 910, 774
prefers to shop at regardless of distance
to the customer’s home address.

DEPARTMENT table:
Attribute Description Sample values
DEPT The Dillard’s unique identifier for a collec- 0471, 0029
tion of merchandise within a store format
DEPT_DESC The name for a department collection (low- “Christian Dior”, “REBA”
est level of the category hierarchy) of mer-
chandise within a store format.
DEPTDEC The first three digits of a department code, a 047X, 002X
way to classify departments at a higher level.
DEPTDEC_DESC Descriptive name representing the decade ‘BASICS’, ‘TREATMENT’
(middle level of the category hierarchy) to
which a department belongs.
DEPTCENT The first two digits of a department code, a 04XX, 00XX
way to classify departments at a higher level.
DEPTCENT_DESC The descriptive name of the century (top CHILDRENS, COSMETICS
level of the category hierarchy).

SKU table:
Attribute Description Sample values
SKU Unique identifier for an item, identifies the 0557578, 6383039
item by size within a color and style for a
particular vendor.
DEPT The Dillard’s unique identifier for a collec- 0134, 0343
tion of merchandise within a store format.
SKU_CLASS Three-character alpha/numeric classification K51, 220
code used to define the merchandise. Class
requirements vary by department.
SKU_STYLE The Dillard’s numeric identifier for a style of 091923690, LBF41728
merchandise.
UPC A number provided by vendors to identify 889448437421, 44212146767
their product to the size level.
COLOR Color of an item. BLACK, PINEBARK
SKU_SIZE Size of an item. Product sizes are not stan- 6, 085M
dardized and issued by vendor
BRAND_NAME The item’s brand. Stride Rite, UNKNOWN
CLASSIFICATION Category used to sort products into logical Dress Shoe
groups.
PACKSIZE Number that describes how many of the 001, 002
product come in a package

ISTUDY
SKU_STORE table:
Attribute Description Sample values
STORE The numerical identifier for a Dillard’s store. 915, 701
SKU Unique identifier for an item, identifies the item by size 4305296, 6137609
within a color and style for a particular vendor.
RETAIL The price of an item. 11.90, 45.15
COST The price charged by a vendor for an item. 8.51, 44.84

STORE table:
Attribute Description Sample values
STORE The numerical identifier for any type of Dillard’s location. 767, 460
DIVISION The division to which a location is assigned for operational 07, 04
purposes.
CITY The city where the store is located. IRVING, MOBILE
STATE The state abbreviation where the store is located. MO, AL
ZIP_CODE The 5-digit zip code of a store’s address. 70601, 35801
ZIP_SECSEG The 4-digit code of a neighborhood within a specific zip 5052, 6474
code.

TRANSACT table:
Attribute Description Sample values
TRANSACTION_ID Unique numerical identifier for each scan of an item at a 40333797, 15129264
register.
TRAN_DATE Calendar date the transaction occurred in a store. 1/1/2015, 5/19/2014
STORE The numerical identifier for any type of Dillard’s location. 716, 205
REGISTER The numerical identifier for the register where the item was 91, 55, 12
scanned.
TRAN_NUM Sequential number of transactions scanned on a register. 184, 14
TRAN_TIME Time of day the transaction occurred. 1839, 1536
CUST_ID Unique identifier representing the instance of a customer. 118458688, 115935775
TRAN_LINE_NUM Sequential number of each scan or element in a transaction. 3, 2
MIC Manufacturer Identification Code used to uniquely identify 154, 128, 217
a vendor or brand within a department.
TRAN_TYPE An identifier for a purchase or return type of transaction or P, R
line item
ORIG_PRICE The original unit price of an item before discounts. 20.00, 6.00
SALE_PRICE The discounted unit price of an item. 15.00, 2.64, 6.00
TRAN_AMT The total pre-tax dollar amount the customer paid in a 15.00, 2.64
transaction.
TENDER_TYPE The type of payment a customer used to complete the BANK, DLRD,
transaction. DAMX
SKU Unique identifier for an item, identifies the item by size 6107653, 9999999950
within a color and style for a particular vendor.

Source: http://walton.uark.edu/enterprise/dillardshome.php (accessed January 15, 2021).

Lab 1-4 Part 2 Objective Questions (LO 1-1, 1-3, 1-4)


OQ1. What tables and fields could address the question of the profit margin (sales
price less cost) on each product (SKU) available for sale?
OQ2. If you’re interested in learning which product is sold most often at each store,
which tables and fields would you consider?

46

ISTUDY
Lab 1-4 Part 2 Analysis Questions (LO 1-1, 1-3, 1-4)
AQ1. You’re trying to learn about where Dillard’s stores are located to identify loca-
tions for the next additional store. Consider the STORE table. What questions
could be asked about store location given data availability?
AQ2. What questions would you have regarding data fields in the SKU table that
could be used to help address the cost of shipping? What additional information
would be helpful to address this question?

Lab 1-4 Submit Your Screenshot Lab Document


No screenshots are required for this lab.

Lab 1-5 Comprehensive Case: Connect to Dillard’s Store Data


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Dillard’s is a department store with approximately 330 stores in 29 states
in the United States. Its headquarters is located in Little Rock, Arkansas. You can learn
more about Dillard’s by looking at finance.yahoo.com (ticker symbol = DDS) and the Wiki-
pedia site for DDS. You’ll quickly note that William T. Dillard II is an accounting grad
of the University of Arkansas and the Walton College of Business, which may be why he
shared transaction data with us to make available for this lab and labs throughout this text.
In this lab, you will learn how to load Dillard’s data into the tools used for data analysis.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
­Desktop (waltonlab.uark.edu). See your instructor for login credentials.

Lab 1-5 Part 1 Load the Dillard’s Data in Excel + Power


Query and Tableau Prep
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 1-5 [Your name] [Your email address].docx.
From the Walton College website, we note the following:

The Dillard’s Department Store Database contains retail sales information gathered
from store sales transactions. The sale process begins when a customer brings items
intended for purchase (clothing, jewelry, home décor, etc.) to any store register. A
Dillard’s sales associate scans the individual items to be purchased with a barcode
reader. This populates the transaction table (TRANSACT), which will later be used
to generate a sales receipt listing the item, department, and cost information (related
price, sale price, etc.) for the customer. When the customer provides payment for the
items, payment details are recorded in the transaction table, the receipt is printed,
and the transaction is complete. Other tables are used to store information about
stores, products, and departments.
Source: http://walton.uark.edu/enterprise/dillardshome.php (accessed July 15, 2021).

This is a gifted dataset that is based on real operational data. Like any real database, integ-
rity problems may be noted. This can provide a unique opportunity not only to be exposed to
real data, but also to illustrate the effects of data integrity problems. The TRANSACT table
itself contains 107,572,906 records. Analyzing the entire population would take a significant
amount of computational time, especially if multiple users are querying it at the same time.

ISTUDY
Rev. Confirming Pages

In Part 1 of this lab, you will learn how to load the Dillard’s data into either Excel +
Power Query or Tableau Prep so that you can extract, transform, and load the data for later
assignments. You will also filter the data to a more manageable size. In Part 2, you will learn
how to load the Dillard’s data into either Power BI Desktop or Tableau Desktop to prepare
your data for visualization and Data Analytics models.

Microsoft | Excel + Power Query Editor

1. Create a new project in Microsoft Excel.


2. In the Home ribbon, click Get Data > From Database > From SQL Server
database and click Connect.
3. Enter the following and click OK:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
c. Data connectivity mode: Direct Query
4. If prompted for credentials, click Use my current credentials and click
Connect.
5. If prompted with a warning about an insecure connection, click OK.
6. Check the box Select multiple items.
7. Check the following tables and click Transform Data or Edit:
a. TRANSACT, STORE Note: Future labs may ask you to load different
tables.
8. Take a screenshot (label it 1-5MA).
9. In Power Query Editor:
a. Click the TRANSACT query from the list on the left side of the screen.
b. Click the drop-down menu to the right of the TRAN_DATE attribute to
show filter options.
c. Choose Date Filters > Between. . . .
d. Enter the date range is after or equal to 1/1/2014 and before or equal to
1/7/2014 and click OK. Note: Future labs may ask you to load different
date ranges.
e. Click Close & Load and wait for a moment while the data load into
Excel.
f. If you see a warning that not all data can be displayed, click OK.
10. Take a screenshot (label it 1-5MB).
11. When you are finished answering the lab questions you may close Excel. Save
your file as Lab 1-5 Dillard’s Filter.xlsx.

Tableau | Prep

1. Open Tableau Prep Builder.


2. Click Connect to Data, and choose Microsoft SQL Server from the list.

48

ISTUDY ric44907_ch01_002-051.indd 48 12/05/22 10:16 AM


Rev. Confirming Pages

3. Enter the following and click Sign In:


a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
c. Authentication: Windows Authentication
4. Double-click the TRANSACT and STORE tables. Note: Future labs may ask
you to load different tables.
5. In the flow sheet, drag STORE onto your new TRANSACT and choose
JOIN.
6 Click the + next to Join 1 and choose + Clean Step.
7. Take a screenshot (label it 1-5TA).
8. Locate TRAN_DATE in the bottom preview pane and click . . . > Filter>
Range of Dates.
9. Enter 1/1/2014 to 1/7/2014 or drag the sliders to limit the dates to this range
and click Done. Note: Because Tableau Prep samples data in this design view,
you may not see any results on this step. Don’t panic; you will see them in the
next step. Future labs may ask you to load different date ranges.
10. Right-click Clean 1 from the flow and rename the step “Date Filter”.
11. Click + next to your new Date Filter task and choose Output.
12. Click the Browse button in the Output pane, choose a folder to save the file,
name your file Lab 1-5 Dillard’s Filter.hyper, and click Accept.
13. Click Run Flow.
14. Take a screenshot (label it 1-5TB).
15. When you are finished answering the lab questions you may close Tableau
Prep. Save your file as Lab 1-5 Dillard’s Filter.tfl.

Lab 1-5 Part 1 Analysis Questions (LO 1-3, 1-4)


AQ1. Why would you want to filter the date field before loading data into your model
for analysis?
AQ2. What are some limitations introduced into your analysis by filtering on such a
small date range?

Lab 1-5 Part 2 Tableau Desktop and Power BI Desktop


Now that you have had some experience preparing data in Excel or Tableau Prep, it is time
to learn how to load the Dillard’s data into either Power BI Desktop or Tableau Desktop so
that you can extract, transform, and load the data for data models and visualizations. You
will also use filters to extract a more manageable dataset.

Microsoft | Power BI Desktop

1. Create a new project in Power BI Desktop.


2. In the Home ribbon, click Get Data > SQL Server database and click Connect.

ISTUDY ric44907_ch01_002-051.indd 49 12/05/22 10:16 AM


3. Enter the following and click OK:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
c. Data Connectivity mode: DirectQuery
4. If prompted for credentials, click Use my current credentials and click
Connect.
5. If prompted with a warning about an insecure connection, click OK.
6. Check the following tables and click Transform Data:
a. TRANSACT, STORE Note: Future labs may ask you to load different
tables.
7. In the Power Query Editor window that appears, complete the following:
a. Click the TRANSACT query.
b. Click the drop-down menu to the right of the TRAN_DATE attribute to
show filter options.
c. Choose Date Filters > Between. . . .
d. Enter the date range is after or equal to 1/1/2014 and before or equal to
1/7/2014 and click OK. Note: Future labs may ask you to load different
date ranges.
e. Take a screenshot (label it 1-5MC).
f. Click Close & Apply and wait for a moment while the data load into
Power BI Desktop.
8. Now that you have loaded your data into Power BI Desktop, continue to
explore the data:
a. Click the Model pane on the left side of the screen.
b. In the TRANSACT table, click the STORE attribute.
c. In the Properties pane on the right, change the data type to Text.
d. In the STORE table, click the STORE attribute.
e. In the Properties pane on the right, change the data type to Text.
9. Take a screenshot (label it 1-5MD).
10. When you are finished answering the lab questions you may close Power BI
Desktop. Save your file as Lab 1-5 Dillard’s Filter.pbix.

Tableau | Desktop

1. Create a new workbook in Tableau.


2. Go to Connect > To a Server > Microsoft SQL Server.
3. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
4. Double-click the tables you need.

50

ISTUDY
a. TRANSACT and STORE Note: Future labs may ask you to load different
tables.
b. Verify the relationship includes the Store attribute from both tables and
close the Relationships window.
5. Click the TRANSACT table and click Update Now to preview the data.
6. Take a screenshot (label it 1-5TC).
7. In the top-right corner of the Data Source screen, click Add below Filters.
a. Click Add. . . .
b. Choose Tran Date and click OK.
c. Choose Range of Dates and click Next.
d. Drag the sliders to limit the data from 1/1/2014 to 1/7/2014 and click
OK. Note: Future labs may ask you to load different date ranges.
e. Take a screenshot (label it 1-5TD).
f. Click OK to return to the Data Source screen.
8. Click the TRANSACT table and then click Update Now to preview the data.
9. When you are finished answering the lab questions you may close Tableau.
Save your file as Lab 1-5 Dillard’s Filter.twb.
Note: Tableau will try to query the server after each change you make and will
take a up to a minute. After each change, click Cancel to stop the query until you’re
ready to prepare the final report.

Lab 1-5 Part 2 Analysis Questions (LO 1-3, 1-4)


AQ1. Compare the tools you used in Part 2 with the tools you used in Part 1. What
are some of the differences between these visualization tools (Power BI Desktop
or Tableau Desktop) and those data prep tools (Power Query or Tableau Prep)?

Lab 1-5 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

ISTUDY
Chapter 2
Mastering the Data

A Look at This Chapter


This chapter provides an overview of the types of data that are used in the accounting cycle and common data that
are stored in a relational database. The second step of the IMPACT cycle is “mastering the data,” which is sometimes
called ETL for extracting, transforming, and loading the data. We will describe how data are requested and extracted
to answer business questions and how to transform data for use via data preparation, validation, and cleaning. We
conclude with an explanation of how to load data into the appropriate tool in preparation for ­analyzing data to
make decisions.

A Look Back
Chapter 1 defined Data Analytics and explained that the value of Data Analytics is in the insights it provides.
We described the Data Analytics Process using the IMPACT cycle model and explained how this process is used
to address both business and accounting questions. We specifically emphasized the importance of identifying
­appropriate questions that Data Analytics might be able to address.

A Look Ahead
Chapter 3 describes how to go from defining business problems to analyzing data, answering questions, and address-
ing business problems. We identify four types of Data Analytics (descriptive, diagnostic, predictive, and prescriptive
analytics) and describe various approaches and techniques that are most relevant to analyzing accounting data.

52

ISTUDY
We are lucky to live in a world in which data are abundant. How­
ever, even with rich sources of data, when it comes to being
able to analyze data and turn them into useful information and
insights, very rarely can an analyst hop right into a dataset and
begin analyzing. Datasets almost always need to be cleaned
and validated before they can be used. Not knowing how to
clean and validate data can, at best, lead to frustration and poor
insights and, at worst, lead to horrible security violations. While
this text takes advantage of open source datasets, these data­
sets have all been scrubbed not only for accuracy, but also to
protect the security and privacy of any individual or company
Wichy/Shutterstock
whose details were in the original dataset.
In 2015, a pair of researchers named Emil Kirkegaard and
Julius Daugbejerg Bjerrekaer scraped data from OkCupid, a free dating website, and provided the data onto the
“Open Science Framework,” a platform researchers use to obtain and share raw data. While the aim of the Open
Science Framework is to increase transparency, the researchers in this instance took that a step too far—and a step
into illegal territory. Kirkegaard and Bjerrekaer did not obtain permission from OkCupid or from the 70,000 OkCupid
users whose identities, ages, genders, religions, personality traits, and other personal details maintained by the dat­
ing site were provided to the public without any work being done to anonymize or sanitize the data. If the researchers
had taken the time to not just validate that the data were complete, but also to sanitize them to protect the individu­
als’ identities, this would not have been a threat or a news story. On May 13, 2015, the Open Science Framework
removed the OkCupid data from the platform, but the damage of the privacy breach had already been done.1
A 2020 report suggested that “Any consumer with an average number of apps on their phone—anywhere between
40 and 80 apps—will have their data shared with hundreds or perhaps thousands of actors online,” said Finn Myrstad,
the digital policy director for the Norwegian Consumer Council, commenting specifically about dating apps.2
All told, data privacy and ethics will continue to be an issue for data providers and data users. In this chapter, we
look at the ethical considerations of data collection and data use as part of mastering the data.

OBJECTIVES
After reading this chapter, you should be able to:

LO 2-1 Understand available internal and external data sources and how data
are organized in an accounting information system.
LO 2-2 Understand how data are stored in a relational database.
LO 2-3 Explain and apply extraction, transformation, and loading (ETL)
techniques to prepare the data for analysis.
LO 2-4 Describe the ethical considerations of data collection and data use.

1
B. Resnick, “Researchers Just Released Profile Data on 70,000 OkCupid Users without Permission,”
Vox, 2016, http://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release (accessed
October 31, 2016).
2
N. Singer and A. Krolik, “Grindr and OkCupid Spread Personal Details, Study Says,” New York Times,
January 13, 2020, https://www.nytimes.com/2020/01/13/technology/grindr-apps-dating-data-tracking.
html (accessed December 2020).

ISTUDY
54 Chapter 2 Mastering the Data

As you learned in Chapter 1, Data Analytics is a process, and we follow an established Data
Analytics model called the IMPACT cycle.3 The IMPACT cycle begins with identifying
business questions and problems that can be, at least partially, addressed with data (the “I”
in the IMPACT model). Once the opportunity or problem has been identified, the next step
is mastering the data (the “M” in the IMPACT model), which requires you to identify and
obtain the data needed for solving the problem. Mastering the data requires a firm under-
standing of what data are available to you and where they are stored, as well as being skilled
in the process of extracting, transforming, and loading (ETL) the data in preparation for
data analysis. While the extraction piece of the ETL process may often be completed by the
information systems team or a database administrator, it is also possible that you will have
access to raw data that you will need to extract out of the source database. Both methods
of requesting data for extraction and of extracting data yourself are covered in this chapter.
The mastering the data step can be described via the ETL process. The ETL process is
made up of the following five steps:
Step 1 Determine the purpose and scope of the data request (extract).
Step 2 Obtain the data (extract).
Step 3 Validate the data for completeness and integrity (transform).
Step 4 Clean the data (transform).
Step 5 Load the data in preparation for data analysis (load).
This chapter will provide details for each of these five steps.

LO 2-1 HOW DATA ARE USED AND STORED IN THE


Understand ACCOUNTING CYCLE
available internal Before you can identify and obtain the data, you must have a comfortable grasp on what
and external data data are available to you and where such data are stored.
sources and how
data are organized
in an accounting Internal and External Data Sources
information Data may come from a number of different sources, either internal or external to the
system. ­organization. Internal data sources include an accounting information system, supply chain
management system, customer relationship management system, and human resource man-
agement system. Enterprise Resource Planning (ERP) (also known as Enterprise ­Systems) is
a category of business management software that integrates applications from throughout the
business (such as manufacturing, accounting, finance, human resources, etc.) into one system.
An accounting information system is a system that records, processes, reports, and com-
municates the results of business transactions to provide financial and nonfinancial infor-
mation for decision-making purposes. A supply chain management (SCM) system includes
information on active vendors (their contact info, where payment should be made, how
much should be paid), the orders made to date (how much, when the orders are made), or
demand schedules for what component of the final product is needed when. The customer
relationship management (CRM) system is an information system for overseeing all interac-
tions with current and potential customers with the goal of improving relationships. CRM
systems contain every detail about the customer. Companies also have a set of data about
what is arguably their most valuable asset: their employees. A human resource management
(HRM) system is an information system for managing all interactions with current and
potential employees.
3
J. P. Isson and J. S. Harriott, Win with Advanced Business Analytics: Creating Business Value from Your
Data (Hoboken, NJ: Wiley, 2013).

ISTUDY
Chapter 2 Mastering the Data 55

Exhibit 2-1 provides an example of different categories of external data sources includ-
ing economic, financial, governmental, and other sources. Each of these may be useful in
addressing accounting and business questions.

Category Dataset Description Website

Economics BRICS World Bank Indicators (Brazil, Russia, India, https://www.kaggle.com/docstein/brics-world-bank-indicators


China and South Africa)
Economics Bureau of Economic Analysis data http://www.bls.gov/data/
Financial Financial statement data https://www.calcbench.com/
Financial Financial statement data, EDGAR, Securities and https://www.sec.gov/edgar.shtml
Exchange Commission
Financial Analyst forecasts Yahoo! Finance (finance.yahoo.com), Analysis Tab
Financial Stock market dataset https://www.kaggle.com/borismarjanovic/
price-volume-data-for-all-us-stocks-etfs
Financial Credit card fraud detection https://www.kaggle.com/mlg-ulb/creditcardfraud
Financial Daily News/Stock Market Prediction https://www.kaggle.com/aaron7sun/stocknews
Financial Retail Data Analytics https://www.kaggle.com/manjeetsingh/retaildataset
Financial Peer-to-peer lending data of approved and rejected loans lendingclub.com (requires login)
Financial Daily stock prices (and weekly and monthly) Yahoo! Finance (finance.yahoo.com), Historical Data Tab
Financial Financial and economic summaries by industry http://pages.stern.nyu.edu/~adamodar/New_Home_Page/
datacurrent.html
General data.world https://data.world/
General kaggle.com https://www.kaggle.com/datasets
Government State of Ohio financial data https://data.ohio.gov/wps/portal/gov/data/
(Data Ohio)
Government City of Chicago financial data https://data.cityofchicago.org
Government City of New York financial data http://www.checkbooknyc.com/spending_landing/yeartype/B/
year/119
Marketing Amazon product reviews https://data.world/datafiniti/consumer-reviews-of-amazon-products
Other Restaurant safety https://data.cityofnewyork.us/Health/
DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j
Other Citywide payroll data https://data.cityofnewyork.us/City-Government/
Citywide-Payroll-Data-Fiscal-Year-/k397-673e
Other Property valuation/assessment https://data.cityofnewyork.us/City-Government/
Property-Valuation-and-Assessment-Data/yjxr-fw8i
Other USA facts—our country in numbers https://www.irs.gov/uac/tax-stats
Other Interesting fun datasets—14 data science projects with https://towardsdatascience.com/14-data-science-projects-to-do-
data during-your-14-day-quarantine-8bd60d1e55e1
Other Links to Big Data Sets—Amazon Web Services https://aws.amazon.com/public-datasets/
Real Estate New York Airbnb data explanation https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
Real Estate U.S. Airbnb data https://www.kaggle.com/kritikseth/us-airbnb-open-data/
tasks?taskId=2542
Real Estate TripAdvisor hotel reviews https://www.kaggle.com/andrewmvd/trip-advisor-hotel-reviews
Retail Retail sales forecasting https://www.kaggle.com/tevecsystems/retail-sales-forecasting

EXHIBIT 2-1 Potential External Data Sources Available to Address Business and Accounting Questions

ISTUDY
56 Chapter 2 Mastering the Data

Accounting Data and Accounting Information Systems


A basic understanding of accounting processes and their associated data, how those data
are organized, and why the data were captured, can help you request the right data and
facilitate that request so that you know exactly where each piece of data is held.
Even with the focus on raw data and where they are stored, there is variety in how data
can be stored. Most commonly, data are stored in either flat files or a database. For many of
our examples and hands-on activities in this text, we will transform our data that are stored
in a database into a flat file. The most common example of a flat file that you are likely used
to is a range of data in an Excel spreadsheet. Put simply, a flat file is a means of maintaining
all of the data you need in one place. We can do a lot of incredible data analysis and num-
ber crunching in flat files in Excel, but as far as storing our data, it is generally inefficient
to store all of the data that you need for a given business process in one place. Instead, a
relational database is frequently used for data storage because it is more capable of ensuring
data integrity and maintaining “one version of the truth” across multiple processes. There
are a variety of applications that support relational databases (these are referred to as rela-
tional database management systems or RDBMS). In this textbook we interact with data
stored on a Microsoft SQL Server.
Microsoft SQL Server can support enterprise-level data in ways that smaller RDBMS
programs, such as Access and SQLite, cannot. While both Microsoft Access and SQLite
can be (and are) used in professional settings, the usage of SQL Server throughout the text-
book is meant to provide an experience that replicates working with much larger and more
complex datasets that you will likely find in the professional world.
There are many other examples of relational database management systems, including
Teradata, MySql, Oracle RDBMS, IBM DB2, Amazon RDS, and PostGreSQL. Regardless
of the DBMS, relational databases have principles that guide how they are modeled.
Exhibit 2-2, a simplified version of a Unified Modeling Language (UML) class diagram,
is an illustration or a drawing of the tables and their relationships to each other (i.e., a data-
base schema). Relational databases are discussed in greater depth in Learning Objective 2-2.

EXHIBIT 2-2 Purchase Order Table


Procure-to-Pay Materials Table PK: PO_Number Supplier Table
* 1
Database Schema FK: Supplier ID
PK: Item_Number FK: EmployeeID PK: Supplier ID
(Simplified) FK: CashDisbursementID

1 1

Purchase Order Details


Table *
*
Composite PK:
FK: Item_Number
FK: PO_Number

LO 2-2 DATA AND RELATIONSHIPS IN A


Understand how RELATIONAL DATABASE
data are stored In this text, we will work with data in a variety of forms, but regardless of the tool we use
in a relational to analyze data, structured data should be stored in a normalized relational database. There
database.
are occasions for working with data directly in the relational database, but many times when
we work with data analysis, we’ll prefer to export the data from the relational database and
view it in a more user-friendly form. The benefit of storing data in a normalized database

ISTUDY
Chapter 2 Mastering the Data 57

outweighs the downside of having to export, validate, and sanitize the data every time you
need to analyze the information.
Storing data in a normalized, relational database instead of a flat file ensures that data
are complete, not redundant, and that business rules and internal controls are enforced;
it also aids communication and integration across business processes. Each one of these
benefits is detailed here:
• Completeness. Ensures that all data required for a business process are included in the
dataset.
• No redundancy. Storing redundant data is to be avoided for several reasons: It takes
up unnecessary space (which is expensive), it takes up unnecessary processing to run
reports to ensure that there aren’t multiple versions of the truth, and it increases the
risk of data-entry errors. Storing data in flat files yields a great deal of redundancy, but
normalized relational databases require there to be one version of the truth and for
each element of data to be stored in only one place.
• Business rules enforcement. As will become increasingly evident as we progress
through the material in this text, relational databases can be designed to aid in the
placement and enforcement of internal controls and business rules in ways that flat
files cannot.
• Communication and integration of business processes. Relational databases should
be designed to support business processes across the organization, which results
in improved communication across functional areas and more integrated business
processes.4
It is valuable to spend some time basking in the benefits of storing data in a relational data-
base because it is not necessarily easier to do so when it comes to building the data model or
understanding the structure. It is arguably more complex to normalize your data than it is to
throw redundant data without business rules or internal controls into a spreadsheet.

Columns in a Table: Primary Keys, Foreign Keys,


and Descriptive Attributes
When requesting data, it is critical to understand how the tables in a relational database are
related. This is a brief overview of the different types of attributes in a table and how these
attributes support the relationships between tables. It is certainly not a comprehensive take
on relational data modeling, but it should be adequate in preparing you for creating data
requests.
Every column in a table must be both unique and relevant to the purpose of the table.
There are three types of columns: primary keys, foreign keys, and descriptive attributes.
Each table must have a primary key. The primary key is typically made up of one col-
umn. The purpose of the primary key is to ensure that each row in the table is unique,
so it is often referred to as a “unique identifier.” It is rarely truly descriptive; instead, a
collection of letters or simply sequential numbers are often used. As a student, you are
probably already very familiar with your unique identifier—your student ID number at the
university is the way you as a student are stored as a unique record in the university’s data
model! Other examples of unique identifiers that you are familiar with would be Amazon
order numbers, invoice numbers, account numbers, Social Security numbers, and driver’s
license numbers.
One of the biggest differences between a flat file and a relational database is simply how
many tables there are—when you request your data into a flat file, you’ll receive one big

4
G. C. Simsion and G. C. Witt, Data Modeling Essentials (Amsterdam: Morgan Kaufmann, 2005).

ISTUDY
58 Chapter 2 Mastering the Data

table with a lot of redundancy. While this is often ideal for analyzing data, when the data
are stored in the database, each group of information is stored in a separate table. Then, the
tables that are related to one another are identified (e.g., Supplier and Purchase Order are
related; it’s important to know which Supplier the Purchase Order is from). The relation-
ship is created by placing a foreign key in one of the two tables that are related. The foreign
key is another type of attribute, and its function is to create the relationship between two
tables. Whenever two tables are related, one of those tables must contain a foreign key to
create the relationship.
The other columns in a table are descriptive attributes. For example, Supplier Name is
a critical piece of data when it comes to understanding the business process, but it is not
necessary to build the data model. Primary and foreign keys facilitate the structure of a rela-
tional database, and the descriptive attributes provide actual business information.
Refer to Exhibit 2-2, the database schema for a typical procure-to-pay process. Each table
has an attribute with the letters “PK” next to them—these are the primary keys for each
table. The primary key for the Materials Table is “Item_Number,” the primary key for the
Purchase Order Table is “PO_Number,” and so on. Several of the tables also have attributes
with the letters “FK” next to them—these are the foreign keys that create the relationship
between pairs of tables. For example, look at the relationship between the Supplier Table
and the Purchase Order Table. The primary key in the Supplier Table is “Supplier ID.” The
line between the two tables links the primary key to a foreign key in the Purchase Order
Table, also named “Supplier ID.”
The Line Items Table in Exhibit 2-3 has so much detail in it that it requires two attributes
to combine as a primary key. This is a special case of a primary key often referred to as a
composite primary key, in which the two foreign keys from the tables that it is linking com-
bine to make up a unique identifier. The theory and details that support the necessity of this
linking table are beyond the scope of this text—if you can identify the primary and foreign
keys, you’ll be able to identify the data that you need to request. Exhibit 2-4 shows a subset
of the data that are represented by the Purchase Order table. You can see that each of the
attributes listed in the class diagram appears as a column, and the data for each purchase
order are accounted for in the rows.

EXHIBIT 2-3 Purchase Order Detail


Line Items Table:
Purchase Order Detail PO_Number Item_Number Quantity Purchased
Table 1787 10 50
1787 25 50
1789 5 30
1790 5 100

EXHIBIT 2-4 Purchase Order Table


Purchase Order Table
PO_Number Date Created By Approved By Supplier ID Employee ID CashDisbursement ID
1787 11/1/2020 1001 1010 1 52 2001
1788 11/1/2020 1005 1010 2 52 2003
1789 11/8/2020 1002 1010 1 52 2004
1790 11/15/2020 1005 1010 1 52 2004

ISTUDY
Chapter 2 Mastering the Data 59

PROGRESS CHECK
1. Referring to Exhibit 2-2, locate the relationship between the Supplier and
­Purchase Order tables. What is the unique identifier of each table? (The unique
identifier attribute is called the primary key—more on how it’s determined in the
next learning objective.) Which table contains the attribute that creates the rela­
tionship? (This attribute is called the foreign key—more on how it’s determined in
the next learning objective.)
2. Referring to Exhibit 2-2, review the attributes in the Purchase Order table. There
are two foreign keys listed in this table that do not relate to any of the tables in
the diagram. Which tables do you think they are? What type of data would be
stored in those two tables?
3. Refer to the two tables that you identified in Progress Check 2 that would
relate to the Purchase Order table, but are not pictured in this diagram. Draw
a sketch of what the UML Class Diagram would look like if those tables were
included. Draw the two classes to represent the two tables (i.e., rectangles), the
relationships that should exist, and identify the primary keys for the two new
tables.

DATA DICTIONARIES
In the previous section, you learned about how data are stored by focusing on the
­procure-to-pay database schema. Viewing schemas and processes in isolation clarifies each
individual process, but it can also distort reality—these schemas typically do not represent
their own separate databases. Rather, each process-specific database schema is a piece of a
greater whole, all combining to form one integrated database.
As you can imagine, once these processes come together to be supported in one data-
base, the amount of data can be massive. Understanding the processes and the basics of
how data are stored is critical, but even with a sound foundation, it would be nearly impos-
sible for an individual to remember where each piece of data is stored, or what each piece
of data represents.
Creating and using a data dictionary is paramount in helping database administrators
maintain databases and analysts identify the data they need to use. In Chapter 1, you
were introduced to the data dictionary for the LendingClub data for rejected loans (DAA
Chapter 1-1 Data). The same cut-out of the LendingClub data dictionary is provided in
Exhibit 2-5 as a reminder.
Because the LendingClub data are provided in a flat file, the only information nec-
essary to describe the data are the attribute name (e.g., Amount Requested) and a
description of that attribute. The description ensures that the data in each attribute are
used and analyzed in the appropriate way—it’s always important to remember that tech-
nology will do exactly what you tell it to, so you must be smarter than the computer!
If you run analysis on an attribute thinking it means one thing, when it actually means
another, you could make some big mistakes and bad decisions even when you are work-
ing with data validated for completeness and integrity. It’s critical to get to know the
data through database schemas and data dictionaries thoroughly before attempting to
do any data analysis.
When you are working with data stored in a relational database, you will have more
attributes to keep track of in the data dictionary. Exhibit 2-6 provides an example of a data
dictionary for a generic Supplier table:

ISTUDY
60 Chapter 2 Mastering the Data

EXHIBIT 2-5 RejectStats File Description


LendingClub Data
Amount Requested Total requested loan amount
Dictionary for Rejected
Loan Data (DAA Application Date Date of borrower application
­Chapter 1-1 Data)
Loan Title Loan title
Source: LendingClub Data
Risk_Score Borrower risk (FICO) score

Dept-To-Income Ratio Ratio of borrower total monthly debt payments


divided by monthly income.

Zip Code The first 3 numbers of the borrower zip code


provided from loan application.
State Two digit State Abbreviation provided from
loan application.
Employment Length Employment length in years, where 0 is less
than 1 and 10 is greater than 10.
Policy Code policy_code=1 if publicly available.
policy_code=2 if not publicly available

Primary or ­Foreign Key? Required Attribute Name Description Data Type Default Value Field Size Notes
PK Y Supplier ID Unique Identifier for Each Number n/a 10
Supplier
N Supplier Name Official Name of Supplier Short Text n/a 30
FK N Supplier Type Type Code for Different Number Null 10 1: Vendor
Supplier Categories 2: Misc

EXHIBIT 2-6 Supplier Data Dictionary

PROGRESS CHECK
4. What is the purpose of the primary key? A foreign key? A nonkey (descriptive)
attribute?
5. How do data dictionaries help you understand the data from a database or flat
file?

LO 2-3 EXTRACT, TRANSFORM, AND LOAD (ETL)


Explain and THE DATA
apply extraction, Once you have familiarized yourself with the data via data dictionaries and schemas, you
transformation, are prepared to request the data from the database manager or extract the data yourself. The
and loading (ETL)
ETL process begins with identifying which data you need and is complete when the clean
techniques to
prepare the data
data are loaded in the appropriate format into the tool to be used for analysis.
for analysis. This process involves:
1. Determining the purpose and scope of the data request.
2. Obtaining the data.
3. Validating the data for completeness and integrity.
4. Cleaning the data.
5. Loading the data for analysis.

ISTUDY
Chapter 2 Mastering the Data 61

Extract
Determine exactly what data you need in order to answer your business questions. Request-
ing data is often an iterative process, but the more prepared you are when requesting data,
the more time you will save for yourself and the database team in the long run.
Requesting the data involves the first two steps of the ETL process. Each step has ques-
tions associated with it that you should try to answer.

Step 1: Determine the Purpose and Scope of the Data Request


• What is the purpose of the data request? What do you need the data to solve?
What business problem will they address?
• What risk exists in data integrity (e.g., reliability, usefulness)? What is the
mitigation plan?
• What other information will impact the nature, timing, and extent of the data analysis?
Once the purpose of the data request is determined and scoped, as well as any risks and
assumptions documented, the next step is to determine whom to ask and specifically what is
needed, what format is needed (Excel, PDF, database), and by what deadline.

Step 2: Obtain the Data


• How will data be requested and/or obtained? Do you have access to the data yourself,
or do you need to request a database administrator or the information systems depart-
ment to provide the data for you?
• If you need to request the data, is there a standard data request form that you should
use? From whom do you request the data?
• Where are the data located in the financial or other related systems?
• What specific data are needed (tables and fields)?
• What tools will be used to perform data analytic tests or procedures and why?

Obtaining the Data via a Data Request


Determining not only what data are needed, but also which tool will be used to test and
process the data will aid the database administrator in providing the data to you in the most
accessible format.
It is also necessary to specify the format in which you would like to receive the data; it is
often preferred to receive data in a flat file (i.e., if the data you requested reside in multiple
tables or different databases, they should be combined into one file without any hierarchy
or relationships built in), with the first row containing column headings (names of the
fields requested), and each subsequent row containing data that correspond with the col-
umn headings. Subtotals, breaks, and subheadings complicate data cleaning and should not
be included.5 When you receive the data, make sure that you understand the data in each
column; the data dictionary should prove extremely helpful for this. If a data dictionary is
unavailable, then you should plan to meet with database users to get a clear understanding
of the data in each column.

Lab Connection
Lab 2-1 has you explore work through the process of requesting data from IT.

5
T. Singleton, “What Every IT Auditor Should Know about Data Analytics,” n.d., http://www.isaca.
org/Journal/archives/2013/Volume-6/Pages/What-Every-IT-Auditor-Should-Know-About-Data-Analytics.
aspx#2.

ISTUDY
62 Chapter 2 Mastering the Data

In a later chapter, you will be provided a deep dive into the audit data standards (ADS)
developed by the American Institute of Certified Public Accountants (AICPA).6 The aim
of the ADS is to alleviate the headaches associated with data requests by serving as a guide
to standardize these requests and specify the format an auditor desires from the company
being audited. These include the following:
1. Order-to-Cash subledger standards
2. Procure-to-Pay subledger standards
3. Inventory subledger standards
4. General Ledger standards
While the ADS provide an opportunity for standardization, they are voluntary. Regard-
less of whether your request for data will conform to the standards, a data request form
template (as shown in Exhibit 2-7) can make communication easier between data requester
and provider.
EXHIBIT 2-7 Requester Name:
Example Standard Data
Request Form Requester Contact Number:

Requester Email Address:

Please provide a description of the information needed (indicate which tables and which fields you require):

What will the information be used for?

Frequency (circle one) One-Off Annually Termly Other:_____

Format you wish the data to be delivered in (circle one): Spreadsheet Text File
Word Document Other: _____

Request Date:

Required Date:

Intended Audience:

Customer (if not requester):

Once the data are received, you can move on to the transformation phase of the ETL
process. The next step is to ensure the completeness and integrity of the extracted data.

Obtaining the Data Yourself


At times, you will have direct access to a database or information system that holds all
or some of the data you need. In this case, you may not need to go through a formal data
request process, and you can simply extract the data yourself.
6
For a description of the audit data standards, please see this website: https://www.aicpa.org
/interestareas/frc/assuranceadvisoryservices/pages/assuranceandadvisory.aspx.

ISTUDY
Chapter 2 Mastering the Data 63

After identifying the goal of the data analysis project in the first step of the IMPACT
cycle, you can follow a similar process to how you would request the data if you were going
to extract it yourself:
1. Identify the tables that contain the information you need. You can do this by looking
through the data dictionary or the relationship model.
2. Identify which attributes, specifically, hold the information you need in each table.
3. Identify how those tables are related to each other.
Once you have identified the data you need, you can start gathering the information.
There are a variety of methods that you could take to retrieve the data. Two will be explained
briefly here—SQL and Excel—and there is a deep dive into SQL in Appendices D and E, as
well as a deep dive into Excel’s VLookup and Index/Match in Appendix B.
SQL: “Structured Query Language” (SQL, often pronounced sequel) is a computer lan-
guage to interact with data (tables, records, and attributes) in a database by creating, updat-
ing, deleting, and extracting. For Data Analytics we only need to focus on extracting data
that match the criteria of our analysis goals. Using SQL, we can combine data from one
or more tables and organize the data in a way that is more intuitive and useful for data
analysis than the way the data are stored in the relational database. A firm understanding of
the data—the tables, how they are related, and their respective primary and foreign keys—is
integral to extracting the data.
Typically, data should be stored in the database and analyzed in another tool such as Excel,
IDEA, or Tableau. However, you can choose to extract only the portion of the data that you
wish to analyze via SQL instead of extracting full tables and transforming the data in Excel,
IDEA, or Tableau. This is especially preferable when the raw data stored in the database are
large enough to overwhelm Excel. Excel 2016 can hold only 1,048,576 rows on one spread-
sheet. When you attempt to bring in full tables that exceed that amount, even when you use
Excel’s powerful Power BI tools, it will slow down your analysis if the full table isn’t necessary.
As you will explore in labs throughout this textbook, SQL isn’t only directly within the
database. When you plan to perform your analysis in Excel, Power BI, or Tableau, each tool
has an SQL option for you to directly connect to the database and pull in a subset of the
data.
There is more description about writing queries and a chance to practice creating joins
in Appendix E.
Microsoft Excel or Power BI: When data are not stored in a relational database, or are not
too large for Excel, the entire table can be analyzed directly in a spreadsheet. The advantage
is that further analysis can be done in Excel or Power BI and it is beneficial to have all the
data to drill down into more detail once the initial question is answered. This approach is
often simpler for doing exploratory analysis (more on this in a later chapter). Understand-
ing the primary key and foreign key relationships is also integral to working with the data
directly in Excel.
When your data are stored directly in Excel, you can also use Excel functions and for-
mulas to combine data from multiple Excel tables into one table, similar to how you can
join data with SQL in Access or another relational database. Two of Excel’s most useful
techniques for looking up data from two separate tables and matching them based on a
matching primary key/foreign key relationship is the VLookup or Index/Match functions.
There are a variety of ways that the VLookup or Index/Match function can be used, but for
extracting and transforming data it is best used to add a column to a table.
More information about using VLookup and Index/Match functions in Excel is provided
in Appendix B.
The question of whether to use SQL or Excel’s tools (such as VLookup) is primar-
ily answered by where the data are stored. Because data are most frequently stored in a

ISTUDY
64 Chapter 2 Mastering the Data

relational database (as discussed earlier in this chapter, due to the efficiency and data integ-
rity benefits relational databases provide), SQL will often be the best option for retrieving
data, after which those data can be loaded into Excel or another tool for further analysis.
Another benefit of SQL queries is that they can be saved and reproduced at will or at
regular intervals. Having a saved SQL query can make it much easier and more efficient
to re-create data requests. However, if the data are already stored in a flat file in Excel,
there is little reason to use SQL. Sometimes when you are performing exploratory analysis,
even if the data are stored in a relational database, it can be beneficial to load entire tables
into Excel and bypass the SQL step. This should be considered carefully before doing so,
though, because relational databases handle large amounts of data much better than Excel
can. Writing SQL queries can also make it easier to load only the data you need to analyze
into Excel so that you do not overwhelm Excel’s resources.

Data Analytics at Work

Jump Start Your Accounting Career with Data Analytics Knowledge


A Robert Half survey finds that 86 percent of CFOs say that Data Analytics skills are
mandatory for at least some accounting and finance positions, with 37 percent of
CFOs saying these skills are mandatory for all accounting and finance positions, and
another 49 percent reported they are mandatory for some roles. Indeed, these CFOs
reported that technology experience and aptitude are the most difficult to find in
accounting and finance candidates.
So, jump right in and be ready for the accounting careers of the future, starting with
extract, transfer, and load (ETL) skills!
Source: Robert Half Associates, “Survey: Finance Leaders Report Technology Skills Most Dif­
ficult to Find When Hiring,” August 22, 2019, http://rh-us.mediaroom.com/2019-08-22-Survey-
Finance-Leaders-Report-Technology-Skills-Most-Difficult-To-Find-When-Hiring (accessed
January 22, 2021).

Transform
Step 3: Validating the Data for Completeness and Integrity
Anytime data are moved from one location to another, it is possible that some of the data
could have been lost during the extraction. It is critical to ensure that the extracted data
are complete (that the data you wish to analyze were extracted fully) and that the integ-
rity of the data remains (that none of the data have been manipulated, tampered with, or
duplicated during the extraction). Being able to validate the data successfully requires you
to not only have the technical skills to perform the task, but also to know your data well. If
you know what to reasonably expect from the data in the extraction then you have a higher
likelihood of identifying errors or issues from the extraction. Examples of data validation
questions are: “How many records should have been extracted?” “What checksums or con-
trol totals can be performed to ensure data extraction is accurate?”
The following four steps should be completed to validate the data after extraction:
1. Compare the number of records that were extracted to the number of records in the source
database: This will give you a quick snapshot into whether any data were skipped or
didn’t extract properly due to an error or data type mismatch. This is a critical first
step, but it will not provide information about the data themselves other than ensuring
that the record counts match.

ISTUDY
Chapter 2 Mastering the Data 65

2. Compare descriptive statistics for numeric fields: Calculating the minimums, maximums,
averages, and medians will help ensure that the numeric data were extracted completely.
3. Validate Date/Time fields in the same way as numeric fields by converting the data type
to numeric and running descriptive statistic comparisons.
4. Compare string limits for text fields: Text fields are unlikely to cause an issue if you
extracted your data into Excel because Excel allows a generous maximum character
number (for example, Excel 2016 allows 32,767 characters per cell). However, if you
extracted your data into a tool that does limit the number of characters in a string, you
will want to compare these limits to the source database’s limits per field to ensure that
you haven’t cut off any characters.
If an error is found, depending on the size of the dataset, you may be able to easily find
the missing or erroneous data by scanning the information with your eyes. However, if the
dataset is large, or if the error is difficult to find, it may be easiest to go back to the extrac-
tion and examine how the data were extracted, fix any errors in the SQL code, and re-run
the extraction.

Lab Connection
Lab 2-5, Lab 2-6, Lab 2-7, and Lab 2-8 explore the process of loading and
validating data.

Step 4: Cleaning the Data


After validating the data, you should pay close attention to the state of the data and clean
them as necessary to improve the quality of the data and subsequent analysis. The follow-
ing four items are some of the more common ways that data will need to be cleaned after
extraction and validation:
1. Remove headings or subtotals: Depending on the extraction technique used and the file
type of the extraction, it is possible that your data could contain headings or subtotals
that are not useful for analysis. Of course, these issues could be overcome in the extrac-
tion steps of the ETL process if you are careful to request the data in the correct format
or to only extract exactly the data you need.
2. Clean leading zeroes and nonprintable characters: Sometimes data will contain leading
zeroes or “phantom” (nonprintable) characters. This will happen particularly when
numbers or dates were stored as text in the source database but need to be analyzed as
numbers. Nonprintable characters can be white spaces, page breaks, line breaks, tabs,
and so on, and can be summarized as characters that our human eyes can’t see, but that
the computer interprets as a part of the string. These can cause trouble when joining
data because, while two strings may look identical to our eyes, the computer will read
the nonprintable characters and will not find a match.
3. Format negative numbers: If there are negative numbers in your dataset, ensure that the
formatting will work for your analysis. For example, if your data contain negative num-
bers formatted in parentheses and you would prefer this formatting to be as a negative
sign, this needs to be corrected and consistent.
4. Correct inconsistencies across data, in general: If the source database did not enforce
certain rules around data entry, it is possible that there are inconsistencies across the
data—for example, if there is a state field, Arkansas could be formatted as “AR,” “Ark,”
“Ar.,” and so on. These will need to be replaced with a common value before you begin
your analysis if you are interested in grouping data geographically.

ISTUDY
66 Chapter 2 Mastering the Data

Lab Connection
Lab 2-2 and Lab 2-3 walk through how to prepare data for analysis and
resolve common data quality issues.

A Note about Data Quality


As you prepare your data for analysis, you should pay close attention to the quality of the
underlying data. Incorrect or invalid data can skew your results and lead to inaccurate con-
clusions. Low-quality data will often contain numerous errors, obsolete or incorrect data,
or invalid data.
To evaluate a dataset and its underlying data quality, here are five main data quality
issues to consider when you evaluate data for the first time:
1. Dates: The most common problems revolve around the date format because there are
so many different ways a date can be presented. For example, look at the different ways
you can show July 6, 2024: 6-Jul-2024; 6.7.2024; 45479 (in Excel); 07/06/2024 (in the
United States); 06/07/2024 (in Europe); and the list goes on. You need to format the
date to match the acceptable format for your tool. The ISO 8601 standard indicates you
should format dates in the year-month-day format (2024-07-06), and most professional
query tools accept this format. If you use Excel to transform dates to this format, high-
light your dates and go to Home > Number > Format Cells and choose Custom. Then
type in YYYY-MM-DD and click OK.

Format Cells Window


in Excel
Microsoft Excel

ISTUDY
Chapter 2 Mastering the Data 67

2. Numbers: Numbers can be misinterpreted, particularly if they are manually entered. For
example, 1 or I; 0 or O; 3 or E; 7 or seven. Watch for invalid number formats when you
start sorting and analyzing your data, and then go back and correct them. A ­ dditionally,
accounting artifacts such as dollar signs, commas, and parentheses are pervasive in
spreadsheet data (e.g., $12,345.22 or (1,422.53)). As you clean the data, remove any extra
accounting characters so numbers appear in their raw form (e.g., 12345.22 or -1422.53).
3. International characters and encoding: When you work with data that span multiple
countries, it is likely that you will come across special characters, such as accent marks
(á or À), umlats (Ü), invisible computer characters (TAB, RETURN, linebreak, null),
or special characters that are used in query and scripting languages (*, #, “, ’). In many
cases, these can be corrected with a find and replace or contained in quote marks so
they are ignored by the query language. Additionally, while most modern computer
programs use UNICODE as the text encoding language, older databases will generate
data in the ASCII format. If your tool fails to populate your dataset accurately, having
international characters and symbols is likely to be a cause.
4. Languages and measures: Similar to international characters, data elements may contain
a variety of words or measures that have the same meaning. For example, cheese or
fromage; ketchup or catsup; pounds or lbs; $ or €; Arkansas or AR. In order to properly
analyze the comparable data, you’ll need to translate them into a common format by
choosing one word as the standard and replacing the equivalent words. Also make sure
the measure doesn’t change the meaning. The total value in U.S. dollars is not the same
thing as the total value in euros. Make sure you’re comparing apples to apples or euros
to euros.
5. Human error: Whenever there is manual input into the data, there is a high probability
that data will be bad simply because they were mistyped or entered into the wrong
place. There’s no hard and fast rule for dealing with input errors other than being
­vigilant and making corrections (e.g., find and replace) when they occur.

Load
Step 5: Loading the Data for Data Analysis
If the extraction and transformation steps have been done well by the time you reach this
step, the loading part of the ETL process should be the simplest step. It is so simple, in
fact, that if your goal is to do your analysis in Excel and you have already transformed
and cleaned your data in Excel, you are finished. There should be no additional loading
necessary.
However, it is possible that Excel is not the last step for analysis. The data analysis tech-
nique you plan to implement, the subject matter of the business questions you intend to
answer, and the way in which you wish to communicate results will all drive the choice of
which tool you use to perform your analysis.
Throughout the text, you will be introduced to a variety of different tools to use for
analyzing data beyond including Excel, Power BI, Tableau Prep, and Tableau Desktop. As
these tools are introduced to you, you will learn how to load data into them.

ETL or ELT?
If loading the data into Excel is indeed the last step, are you actually “extracting, transform-
ing, and loading,” or is it “extracting, loading, and transforming”?
The term ETL has been in popular use since the 1970s, and even though methods for
extracting and transforming data have gotten easier to use, more accessible, as well as more
robust, the term has stuck. Increasingly, however, the procedure is shifting toward ELT. Par-
ticularly with tools such as Microsoft’s Power BI suite, all of the loading and transforming

ISTUDY
68 Chapter 2 Mastering the Data

can be done within Excel, with data directly loaded into Excel from the database, and then
transformed (also within Excel). The most common method for mastering the data that we
use throughout this textbook is more in line with ELT than ETL; however, even when the
order changes from ETL to ELT, it is still more common to refer to the procedure as ETL.

PROGRESS CHECK
6. Describe two different methods for obtaining data for analysis.
7. What are four common data quality issues that must be fixed before analysis can
take place?

LO 2-4 ETHICAL CONSIDERATIONS OF DATA


Describe COLLECTION AND USE
the ethical Mastering the data goes beyond just ETL processes. Mastering the data also includes hav-
considerations of ing some assurance that the data collection is not only secure, but also that the ethics of
data collection and
data collection and data use have been considered.
data use.
In the past, the scope for digital risk was limited to cybersecurity threats to make sure
the data were secure; however, increasingly the concern is the risk of lacking ethical data
practices. Indeed, the concerns regarding data gleaned from traditional and nontraditional
sources are that they are used in an ethical manner and for their intended purpose.
Potential ethical issues include an individual’s right to privacy and whether assurance is
offered that certain data are not misused. For example, is the individual about whom data
has been collected able to limit who has access to her personal information, and how those
data are used or shared? If an individual’s credit card is submitted for an e-commerce trans-
action, does the customer have assurance that the credit card number will not be misused?
To address these and other concerns, the Institute of Business Ethics suggests that com-
panies consider the following six questions to allow a business to create value from data use
and analysis, and still protect the privacy of stakeholders7:
1. How does the company use data, and to what extent are they integrated into firm
­strategy? What is the purpose of the data? Are they accurate or reliable? Will they
­benefit the customer or the employee?
2. Does the company send a privacy notice to individuals when their personal data are
­collected? Is the request to use the data clear to the user? Do they agree to the terms and
conditions of use of their personal data?
3. Does the company assess the risks linked to the specific type of data the company uses?
Have the risks of data use or data breach of potentially sensitive data been considered?
4. Does the company have safeguards in place to mitigate the risks of data misuse? Are
­preventive controls on data access in place and are they effective? Are penalties
­established and enforced for data misuse?
5. Does the company have the appropriate tools to manage the risks of data misuse? Is the
feedback from these tools evaluated and measured? Does internal audit regularly evaluate
these tools?
6. Does our company conduct appropriate due diligence when sharing with or acquiring data
from third parties? Do third-party data providers follow similar ethical standards in the
acquisition and transmission of the data?

7
S. White, “6 Ethical Questions about Big Data,” Financial Management, https://www.fm-magazine.com/
news/2016/jun/ethical-questions-about-big-data.html (accessed December 2020).

ISTUDY
Chapter 2 Mastering the Data 69

The user of the data must continue to recognize the potential risks associated with data
collection and data use, and work to mitigate those risks in a responsible way.

PROGRESS CHECK
8. A firm purchases data from a third party about customer preferences for laundry
detergent. How would you recommend that this firm conduct appropriate due
diligence about whether the third-party data provider follows ethical data prac­
tices? An audit? A questionnaire? What questions should be asked?

Summary
■ The first step in the IMPACT cycle is to identify the questions that you intend to answer
through your data analysis project. Once a data analysis problem or question has been
identified, the next step in the IMPACT cycle is mastering the data, which includes
obtaining the data needed and preparing it for analysis. We often call the processes
associated with mastering the data ETL, which stands for extract, transform, and load.
(LO 2-2, 2-3)
■ In order to obtain the right data, it is important to have a firm grasp of what data are
available to you and how that information is stored. (LO 2-2)
◦ Data are often stored in a relational database, which helps to ensure that an organiza-
tion’s data are complete and to avoid redundancy. Relational databases are made up
of tables with rows of data that represent records. Each record is uniquely identified
with a primary key. Tables are related to other tables by using the primary key from
one table as a foreign key in another table.
■ Extract: To obtain the data, you will either have access to extract the data yourself or you
will need to request the data from a database administrator or the information systems
team. If the latter is the case, you will complete a data request form, indicating exactly
which data you need and why. (LO 2-3)
■ Transform: Once you have the data, they will need to be validated for completeness and
integrity—that is, you will need to ensure that all of the data you need were extracted and
that all data are correct. Sometimes when data are extracted some formatting or some-
times even entire records will get lost, resulting in inaccuracies. Correcting the errors and
cleaning the data is an integral step in mastering the data. (LO 2-3)
■ Load: Finally, after the data have been cleaned, there may be one last step of mastering
the data, which is to load them into the tool that will be used for analysis. Often, the
cleaning and correcting of data occur in Excel, and the analysis will also be done in
Excel. In this case, there is no need to load the data elsewhere. However, if you intend
to do more rigorous statistical analysis than Excel provides, or if you intend to do more
robust data visualization than can be done in Excel, it may be necessary to load the data
into another tool following the transformation process. (LO 2-3)
■ Mastering the data goes beyond just the ETL processes. Those who collect and use data
also have the responsibility of being good stewards, providing some assurance that the
data collection is not only secure, but also that the ethics of data collection and data use
have been considered. (LO 2-4)

ISTUDY
Key Words
accounting information system (54) A system that records, processes, reports, and communicates the
results of business transactions to provide financial and nonfinancial information for decision-making purposes.
composite primary key (58) A special case of a primary key that exists in linking tables. The compos-
ite primary key is made up of the two primary keys in the table that it is linking.
customer relationship management (CRM) system (54) An information system for managing all
interactions between the company and its current and potential customers.
data dictionary (59) Centralized repository of descriptions for all of the data attributes of the dataset.
data request form (62) A method for obtaining data if you do not have access to obtain the data
directly yourself.
descriptive attributes (58) Attributes that exist in relational databases that are neither primary nor
foreign keys. These attributes provide business information, but are not required to build a database. An
example would be “Company Name” or “Employee Address.”
Enterprise Resource Planning (ERP) (54) Also known as Enterprise Systems, a category of business
management software that integrates applications from throughout the business (such as manufacturing,
accounting, finance, human resources, etc.) into one system.
ETL (60) The extract, transform, and load process that is integral to mastering the data.
flat file (57) A means of storing data in one place, such as in an Excel spreadsheet, as opposed to stor-
ing the data in multiple tables, such as in a relational database.
foreign key (58) An attribute that exists in relational databases in order to carry out the relationship
between two tables. This does not serve as the “unique identifier” for each record in a table. These must be
identified when mastering the data from a relational database in order to extract the data correctly from
more than one table.
human resource management (HRM) system (54) An information system for managing all inter-
actions between the company and its current and potential employees.
mastering the data (54) The second step in the IMPACT cycle; it involves identifying and obtaining the
data needed for solving the data analysis problem, as well as cleaning and preparing the data for analysis.
primary key (57) An attribute that is required to exist in each table of a relational database and serves
as the “unique identifier” for each record in a table.
relational database (56) A means of storing data in order to ensure that the data are complete, not
redundant, and to help enforce business rules. Relational databases also aid in communication and integra-
tion of business processes across an organization.
supply chain management (SCM) system (54) An information system that helps manage all the
company’s interactions with suppliers.

ANSWERS TO PROGRESS CHECKS


1. The unique identifier of the Supplier table is [Supplier ID], and the unique identifier of the
Purchase Order table is [PO Number]. The Purchase Order table contains the foreign key.
2. The foreign key attributes in the Purchase Order table that do not relate to any tables
in the view are EmployeeID and CashDisbursementID. These attributes probably relate
to the Employee table (so that we can tell which employee was responsible for each
­Purchase Order) and the Cash Disbursement table (so that we can tell if the Purchase
Orders have been paid for yet, and if so, on which check). The Employee table would be a
complete listing of each employee, as well as containing the details about each employee
(for example, phone number, address, etc.). The Cash Disbursement table would be a list­
ing of the payments the company has made.

70

ISTUDY
3.
Purchase Order Table
Materials Table PK: PO_Number Supplier Table
* 1
FK: Supplier ID
PK: Item_Number FK: EmployeeID PK: Supplier ID
FK: CashDisbursementID

1 1

Purchase Order Details


Table *
*
Composite PK: EmployeesTable
FK: Item_Number PK: EmployeeID
FK: PO_Number

CashDisbursement
Table
PK: Check Number

4. The purpose of the primary key is to uniquely identify each record in a table. The purpose
of a foreign key is to create a relationship between two tables. The purpose of a descrip­
tive attribute is to provide meaningful information about each record in a table. Descrip­
tive attributes aren’t required for a database to run, but they are necessary for people to
gain business information about the data stored in their databases.
5. Data dictionaries provide descriptions of the function (e.g., primary key or foreign key
when applicable), data type, and field names associated with each column (attribute) of
a database. Data dictionaries are especially important when databases contain several
different tables and many different attributes in order to help analysts identify the informa­
tion they need to perform their analysis.
6. Depending on the level of security afforded to a business analyst, she can either obtain
data directly from the database herself or she can request the data. When obtaining data
herself, the analyst must have access to the raw data in the database and a firm knowledge
of SQL and data extraction techniques. When requesting the data, the analyst doesn’t need
the same level of extraction skills, but she still needs to be familiar with the data enough in
order to identify which tables and attributes contain the information she requires.
7. Four common issues that must be fixed are removing headings or subtotals, cleaning
leading zeroes or nonprintable characters, formatting negative numbers, and correcting
inconsistencies across the data.
8. Firms can ask to see the terms and conditions of their third-party data supplier, and ask
questions to come to an understanding regarding if and how privacy practices are main­
tained. They also can evaluate what preventive controls on data access are in place and
assess whether they are followed. Generally, an audit does not need to be performed, but
requesting a questionnaire be filled out would be appropriate.

Multiple Choice Questions


®

1. (LO 2-3) Mastering the data can also be described via the ETL process. The ETL pro­
cess stands for:
a. extract, total, and load data.
b. enter, transform, and load data.
c. extract, transform, and load data.
d. enter, total, and load data.

ISTUDY
2. (LO 2-3) Which of the following describes part of the goal of the ETL process?
a. Identify which approach to Data Analytics should be used.
b. Load the data into a relational database for storage.
c. Communicate the results and insights found through the analysis.
d. Identify and obtain the data needed for solving the problem.
3. (LO 2-2) The advantages of storing data in a relational database include which of the
following?
a. Help in enforcing business rules
b. Increased information redundancy
c. Integrating business processes
d. All of the answers are correct
e. a and b
f. b and c
g. a and c
4. (LO 2-3) The purpose of transforming data is:
a. to validate the data for completeness and integrity.
b. to load the data into the appropriate tool for analysis.
c. to obtain the data from the appropriate source.
d. to identify which data are necessary to complete the analysis.
5. (LO 2-2) Which attribute is required to exist in each table of a relational database and
serves as the “unique identifier” for each record in a table?
a. Foreign key
b. Unique identifier
c. Primary key
d. Key attribute
6. (LO 2-2) The metadata that describe each attribute in a database are which of the following?
a. Composite primary key
b. Data dictionary
c. Descriptive attributes
d. Flat file
7. (LO 2-3) As mentioned in the chapter, which of the following is not a common way that
data will need to be cleaned after extraction and validation?
a. Remove headings and subtotals.
b. Format negative numbers.
c. Clean up trailing zeroes.
d. Correct inconsistencies across data.
8. (LO 2-2) Why is Supplier ID considered to be a primary key for a Supplier table?
a. It contains a unique identifier for each supplier.
b. It is a 10-digit number.
c. It can either be for a vendor or miscellaneous provider.
d. It is used to identify different supplier categories.
9. (LO 2-2) What are attributes that exist in a relational database that are neither primary
nor foreign keys?
a. Nondescript attributes
b. Descriptive attributes
c. Composite keys
d. Relational table attributes

72

ISTUDY
10. (LO 2-4) Which of the following questions are not suggested by the Institute of Business
Ethics to allow a business to create value from data use and analysis, and still protect
the privacy of stakeholders?
a. How does the company use data, and to what extent are they integrated into firm
strategy?
b. Does the company send a privacy notice to individuals when their personal data are
collected?
c. Does the data used by the company include personally identifiable information?
d. Does the company have the appropriate tools to mitigate the risks of data misuse?

Discussion and Analysis


®

1. (LO 2-2) The advantages of a relational database include limiting the amount of redun­
dant data that are stored in a database. Why is this an important advantage? What can
go wrong when redundant data are stored?
2. (LO 2-2) The advantages of a relational database include integrating business pro­
cesses. Why is it preferable to integrate business processes in one information system,
rather than store different business process data in separate, isolated databases?
3. (LO 2-2) Even though it is preferable to store data in a relational database, storing data
across separate tables can make data analysis cumbersome. Describe three reasons it
is worth the trouble to store data in a relational database.
4. (LO 2-2) Among the advantages of using a relational database is enforcing business
rules. Based on your understanding of how the structure of a relational database helps
prevent data redundancy and other advantages, how does the primary key/foreign
key relationship structure help enforce a business rule that indicates that a company
shouldn’t process any purchase orders from suppliers who don’t exist in the database?
5. (LO 2-2) What is the purpose of a data dictionary? Identify four different attributes that
could be stored in a data dictionary, and describe the purpose of each.
6. (LO 2-3) In the ETL process, the first step is extracting the data. When you are obtaining
the data yourself, what are the steps to identifying the data that you need to extract?
7. (LO 2-3) In the ETL process, if the analyst does not have the security permissions to
access the data directly, then he or she will need to fill out a data request form. While
this doesn’t necessarily require the analyst to know extraction techniques, why does
the analyst still need to understand the raw data very well in order to complete the data
request?
8. (LO 2-3) In the ETL process, when an analyst is completing the data request form, there
are a number of fields that the analyst is required to complete. Why do you think it is
important for the analyst to indicate the frequency of the report? How do you think that
would affect what the database administrator does in the extraction?
9. (LO 2-3) Regarding the data request form, why do you think it is important to the data­
base administrator to know the purpose of the request? What would be the importance
of the “To be used in” and “Intended audience” fields?
10. (LO 2-3) In the ETL process, one important step to process when transforming the data
is to work with null, n/a, and zero values in the dataset. If you have a field of quantitative
data (e.g., number of years each individual in the table has held a full-time job), what
would be the effect of the following?
a. Transforming null and n/a values into blanks
b. Transforming null and n/a values into zeroes
c. Deleting records that have null and n/a values from your dataset
(Hint: Think about the impact on different aggregate functions, such as COUNT and
AVERAGE.)

ISTUDY
11. (LO 2-4) What is the theme of each of the six questions proposed by the Institute of
Business Ethics? Which one addresses the purpose of the data? Which one addresses
how the risks associated with data use and collection are mitigated? How could these
two specific objectives be achieved at the same time?

Problems
®

1. (LO 2-2) Match the relational database function to the appropriate relational database
term:
• Composition primary key
• Descriptive attribute
• Foreign key
• Primary key
• Relational database

Relational Database Function Relational Database Term


1. Serves as a unique identifier in a database table.
2. Creates a relationship between two tables.
3. Two foreign keys from the tables that it is linking combine to
make up a unique identifier.
4. Describes each record with characteristics with actual business
information.
5. A means of storing data to ensure data are complete,
­redundant, to help enforce business rules.

2. (LO 2-3) Identify the order sequence in the ETL process as part of mastering the data
(i.e., 1 is first; 5 is last).

Steps of the ETL Process Sequence Order (1 to 5)


1. Validate the data for completeness and integrity.
2. Sanitize the data.
3. Obtain the data.
4. Load the data in preparation for data analysis.
5. Determine the purpose and scope of the data request.

3. (LO 2-3) Identify which ETL tasks would be considered “Validating” the data, and which
would be considered “Cleaning” the data.

ETL Task Validating or Cleaning


1. Compare the number of records that were extracted to the
number of records in the source database.
2. Remove headings or subtotals.
3. Remove leading zeroes and nonprintable characters.
4. Compare descriptive statistics for numeric fields.
5. Format negative numbers.
6. Compare string limits for text fields.
7. Correct inconsistencies across data, in general.

74

ISTUDY
4. (LO 2-3) Match each ETL task to the stage of the ETL process:
• Determine purpose
• Obtain
• Validate
• Clean
• Load

ETL Task Stage of ETL Process


1. Use SQL to extract data from the source database.
2. Remove headings or subtotals.
3. Choose which database and specific data will be needed to
address the accounting question.
4. Compare the number of records extracted to the number of
records in the source database.
5. Make sure all data formats start with two capital letters.
Fix ­inconsistencies.
6. Input the data into the analysis tool.

5. (LO 2-4) For each of the six questions suggested by the Institute of Business Ethics to
evaluate data privacy, categorize each question into one of these three types:
A. Evaluate the company’s purpose of the data
B. Evaluate the company’s use or misuse of the data
C. Evaluate the due diligence of the company’s data vendors in preventing misuse of
the data

Institute of Business Ethics Questions regarding Data Use


and Privacy Category A, B, or C?
1. Does the company assess the risks linked to the specific type
of data the company uses?
2. Does the company send a privacy notice to individuals when
their personal are collected?
3. How does the company use data, and to what extent are they
­integrated into firm strategy?
4. Does our company conduct appropriate due diligence when
­sharing with or acquiring data from third parties?
5. Does the company have the appropriate tools to manage the
risks of misuse?
6. Does the company have safeguards in place to mitigate these
risks of misuse?

6. (LO 2-2) Which of the following are useful, established characteristics of using a rela­
tional database?

Institute of Business Ethics Questions regarding Data Useful, Established Characteristics


Use and Privacy of Relational Databases? Yes/No
1. Completeness
2. Reliable
3. No redundancy
4. Communication and integration of business processes
5. Less costly to purchase
6. Less effort to maintain
7. Business rules are enforced

ISTUDY
7. (LO 2-3) As part of master the data, analysts must make certain trade-offs when they
consider which data to use. Consider these three different scenarios:
a. Analysis: What are the trade-offs of using data that are highly relevant to the
­question, but have a lot of missing data?
b. Analysis: What are the trade-offs an analyst should consider between data that are
very expensive to acquire and analyze, but will most directly address the question at
hand? How would you assess whether they are worth the extra cost?
c. Analysis: What are the trade-offs between extracting needed data by yourself, or
asking a data scientist to get access to the data?
8. (LO 2-4) The Institute of Business Ethics proposes that a company protect the privacy of
stakeholders by considering these questions of its third-party data providers:
• Does our company conduct appropriate due diligence when sharing with or acquir­
ing data from third parties?
• Do third-party data providers follow similar ethical standards in the acquisition and
transmission of the data?
a. Analysis: What type of due diligence with regard to a third party sharing and acquir­
ing data would be appropriate for the company (or company accountant or data
scientist) to perform? An audit? A questionnaire? Standards written in to a contract?
b. Analysis: How would you assess whether the third-party data provider follows ethi­
cal standards in the acquisition and transmission of the data?

76

ISTUDY
LABS ®

Lab 2-1 Request Data from IT—Sláinte


Case Summary: Sláinte is a fictional brewery that has recently gone through big changes.
Sláinte sells six different products. The brewery has only recently expanded its business to
distributing from one state to nine states, and now its business has begun stabilizing after
the expansion. With that stability comes a need for better analysis. You have been hired by
Sláinte to help management better understand the company’s sales data and provide input
for its strategic decisions. In this lab, you will identify appropriate questions and develop
a hypothesis for each question, generate a data request, and evaluate the data you receive.
Data: Lab 2-1 Data Request Form.zip - 10KB Zip / 13KB Word

Lab 2-1 Part 1 Identify the Questions and Generate a


Data Request
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-1 [Your name] [Your email address].docx.
One of the biggest challenges you face with data analysis is getting the right data. You
may have the best questions in the world, but if there are no data available to support your
hypothesis, you will have difficulty providing value. Additionally, there are instances in
which the IT workers may be reluctant to share data with you. They may send incomplete
data, the wrong data, or completely ignore your request. Be persistent, and you may have to
look for creative ways to find insight with an incomplete picture.
One of Sláinte’s first priorities is to identify its areas of success as well as areas of poten-
tial improvement. Your manager has asked you to focus specifically on sales data at this
point. This includes data related to sales orders, products, and customers.
Answer the Lab 2-1 Part 1 Analysis Questions and then complete a data request form for
those data you have identified for your analysis.
1. Open the Data Request Form.
2. Enter your contact information.
3. In the description field, identify the tables and fields that you’d like to analyze, along
with the time periods (e.g., past month, past year, etc.).
4. Indicate what the information will be used for in the appropriate box (internal analysis).
5. Select a frequency. In this case, this is a “One-off request.”
6. Choose a format (spreadsheet).
7. Enter a request date (today) and a required date (one week from today).
8. Take a screenshot (label it 2-1A) of your completed form.

Lab 2-1 Part 1 Analysis Questions (LO 2-2)


AQ1. Given that you are new and trying to get a grasp on Sláinte’s operations,
list three questions related to sales that would help you begin your analysis.
For example, how many products were sold in each state?
AQ2. Now hypothesize the answers to each of the questions. Remember, your answers
don’t have to be correct at this point. They will help you understand what type
of data you are looking for. For example: 500 in Missouri, 6,000 in Pennsylvania,
4,000 in New York, and so on.
AQ3. Finally, for each question, identify the specific tables and fields that are needed
to answer your questions. Use the data dictionary and ER Diagram provided

ISTUDY
in Appendix J for guidance on what tables and attributes are available. For
­example, to answer the question about state sales, you would need the Cus-
tomer_State attribute that is located in the Customer master table as well as the
Sales_Order_Quantity_Sold attribute in the Sales table. If you had access to
store or distribution center location data, you may also look for a State field
there, as well.

Lab 2-1 Part 2 Evaluate the Data Extract


After a few days, Rachel, an IT worker, responds to your request. She gives you the follow-
ing tables and attributes, shown in Lab Exhibit 2-1A:

LAB EXHIBIT 2-1A Sales_Orders Table


Attribute Description of Attribute
Sales_Order_ID (PK) Unique identifier for each sales order
Sales_Order_Date The date of the sales order, regardless of the date the order is entered
Shipping_Cost Shipping cost for the order

Sales_Order_Lines Table
Attribute Description of Attribute
Sales_Order_ID (FK) Unique identifier for each sales order
Sales_Order_Quantity_Sold Sales order line quantity
Product_Sale_Price Sales order line price per unit

Finished_Goods_Products Table
Attribute Description of Attribute
Product_Code (PK) Unique identifier for each product
Product_Description Product description (plain English) to indicate the name or other identifying
characteristics of the product
Product_Sale_Price Price per unit of the associated product

You may notice that while there are a few attributes that may be useful in your sales analy-
sis, the list may be incomplete and be missing several values. This is normal with data requests.

Lab 2-1 Part 2 Objective Questions (LO 2-2)


OQ1. Which tables and attributes are missing from the data extract that would be nec-
essary to answer the question “How many products were sold in each state?”
OQ2. What new question can you answer using the data extract?

Lab 2-1 Part 2 Analysis Questions (LO 2-2)


AQ1. Evaluate your original questions and responses from Part 1. Can you still answer
the original questions that you identified in step 1 with the data provided?
AQ2. What additional tables and attributes would you need to answer your questions?

Lab 2-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

78

ISTUDY
Lab 2-2 Prepare Data for Analysis—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Sláinte is a fictional brewery that has recently gone through big changes.
Sláinte sells six different products. The brewery has only recently expanded its business to
distributing from one state to nine states, and now its business has begun stabilizing after
the expansion. Sláinte has brought you in to help determine potential areas for sales growth
in the next year. Additionally, management have noticed that the company’s margins aren’t
as high as they had budgeted and would like you to help identify some areas where they
could improve their pricing, marketing, or strategy. Specifically, they would like to know
how many of each product were sold, the product’s actual name (not just the product code),
and the months in which different products were sold.
Data: Lab 2-2 Slainte Dataset.zip - 83KB Zip / 90KB Excel

Lab 2-2 Example Output


By the end of this lab, you will create a PivotTable that will let you explore sales data. While
your results will include different data values, your work should look similar to this:

Lab 2-2 Part 1 Prepare a Data Model


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-2 [Your name] [Your email address].docx.
Efficient relational databases contain normalized data. That is, each table contains only
data that are relevant to the object, and tables’ relationships are defined with primary key/
foreign key pairs as mentioned earlier in the chapter.
With Data Analytics, we often need to pull data from where they are stored into a separate
tool, such as Microsoft Power BI, Excel, or Tableau. Each of these tools provides the oppor-
tunity to either connect directly to tables in the source data or to “denormalize” the data.

Microsoft | Excel + Power Query

Microsoft Excel
LAB 2-2M Example of PivotTable in Microsoft Excel for November and December

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 2-2T Example of PivotTable in Tableau Desktop for November and December

In this lab, you will learn how to connect to data in Microsoft Power BI or Excel using
the Internal Data Model and how to connect to data and build relationships among tables
in Tableau. This will prepare you for future labs that require you to transform data, as well
as aid in understanding of primary and foreign key relationships.

Microsoft | Excel + Power Query

1. Create a new blank spreadsheet in Excel.


2. From the Data tab on the ribbon, click Get Data > From File > From
­Workbook. Note: In older versions of Excel, click the New Query button.
3. Locate the Lab 2-2 Slainte Dataset.xlsx file on your computer, and click
Import.
4. In the Navigator, check Select multiple items, then check the following tables
to import:
a. Finished_Goods_Products
b. Sales_Order
c. Sales_Order_Lines
5. Click Edit or Transform Data to open Power Query Editor.
6. Click through the table queries and attributes and correct the following issues:
a. Finished_Goods_Products:
1. Change the data type for Product_Sale_Price to Currency (click
the column header, then click Transform > Data Type > ­Currency.
If prompted, choose Replace current conversion step.

80

ISTUDY
b. Sales_Order_Lines: Change the data type for Product_Sale_Price to
­Currency.
c. Sales_Order: Change the data type for Invoice_Order_Total and
Shipping_Cost to Currency.
7. Take a screenshot (label it 2-2MA) of the Power Query Editor window
with your changes.
8. At this point, we are ready to connect the data to our Excel sheet. We
will only create a connection so we can pull it in for specific analyses.
Click the Home tab and choose the arrow below Close & Load > Close &
Load To. . .
9. Choose Only Create Connection and Add this data to the Data Model and click
OK. The three queries will appear in a tab on the right side of your sheet.
10. Save your workbook as Lab 2-2 Slainte Model.xlsx, and continue to Part 2.

Tableau | Desktop

1. Open Tableau Desktop.


2. Choose Connect > To a File > Microsoft Excel.
3. Locate the Lab 2-2 Slainte Dataset.xlsx and click Open.
4. Double-click or drag the following tables to the data pane on the top-right.
a. Finished_Goods_Products
b. Sales_Order_Lines: If prompted to Edit Relationship, match Product
Code on the left with Product Code on the right and close the window.
c. Sales_Order: If prompted to Edit Relationship, match Sales Order ID on
the left with Sales Order ID on the right and close the window.
5. Take a screenshot (label it 2-2TA).
6. Save your workbook as Lab 2-2 Slainte Model.twb, and continue to Part 2.

Lab 2-2 Part 1 Objective Questions (LO 2-3)


OQ1. How many tables did you just load?
OQ2. How many rows were loaded for the Sales_Order query?
OQ3. How many rows were loaded for the Finished_Goods_Products query?

Lab 2-2 Part 1 Analysis Questions (LO 2-3)


AQ1. Have you used the Microsoft or Tableau tools before this class?
AQ2. Compare and Contrast: If you completed this part with multiple tools, which tool
options do you think will be most useful for preparing future data for analysis?

ISTUDY
Lab 2-2 Part 2 Validate the Data
Now that the data have been prepared and organized, you’re ready for some basic analysis.
Given the sales data, management has asked you to prepare a report showing the total
number of each item sold each month between January and April 2020. This means that we
should create a PivotTable with a column for each month, a row for each product, and the
sum of the quantity sold where the two intersect.

Microsoft | Excel + Power Query

1. Open the Lab 2-2 Slainte Model.xlsx you created in Part 1.


2. Click the Insert tab on the ribbon and choose PivotTable.
3. In the Create PivotTable window, click Use this workbook’s Data Model and
click OK. A PivotTable Fields pane appears on the right of your worksheet.
• Note: If at any point while working with your PivotTable, your ­PivotTable
Fields list disappears, you can make it reappear by ensuring that your
active cell is within the PivotTable itself. If the Field List still doesn’t reap-
pear, navigate to the Analyze tab in the Ribbon, and select Field List.
4. Click the > next to each table to show the available fields. If you don’t see your
three tables, click the All option directly below the PivotTable Fields pane title.
5. Drag Sales_Order.Sales_Order_Date to the Columns pane. Note: When you
add a date, Excel will automatically try to group the data by Year, Quarter,
and so on.
a. Remove Sales_Order_Date (Quarter) from the Columns pane.
6. Drag Finished_Good_Products.Product_Description to the Rows pane.
7. Drag Sales_Order_Lines.Sales_Order_Quantity_Sold to the Values pane.
Note: At this point, a warning will appear asking you to create relationships.
8. Click Auto-Detect. . . to automatically create relationships in the data model.
a. Click Manage Relationships. . . to verify that the primary key–foreign key
pairs are correct:
1. Sales_Order_Lines (Product_Code) = Finished_Good_Products
(Product_Code)
2. Sales_Order_Lines (Sales_Order_ID) = Sales_Order (Sales_Order_ID)
b. Take a screenshot (label it 2-2MB) of your Manage Relationships
window.
c. Click Close.
9. In the PivotTable, drill down to show the monthly data:
a. Click the + next to 2020.
b. If you see individual sales dates, right-click Jan and choose Expand/
Collapse > Collapse Entire Field.
10. Clean up your PivotTable. Rename labels and the title of the report to some-
thing more useful, like “Sales by Month”.
11. Take a screenshot (label it 2-2MC).
12. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 2-2 Slainte Pivot.xlsx.

82

ISTUDY
Tableau | Desktop

1. Open the Lab 2-2 Model.twb you created in Part 1.


2. Click on Sheet 1.
3. Drag Sales_Order.Sales Order Date to the Columns shelf. Note: When you
add a date, Tableau will automatically try to group the data by Year, Quarter,
and so on.
a. In the Columns pane, drill down on the date to show the quarters and
months [click the + next to YEAR(Sales Order Date) to show the
­Quarters, etc.].
b. Click QUARTER(Sales Order Date) and choose Remove.
4. Drag Finished_Good_Products. Product Description to the Rows shelf.
5. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Text button in the
Marks shelf.
6. To show the totals, click the Analytics tab next to the Data pane and double-
click Totals.
7. Clean up your sheet. Right-click the Sheet 1 tab at the bottom of the screen
and choose Rename. Name the tab something more useful, like “Sales by
Month” and press Enter.
8. Take a screenshot (label it 2-2TB).
9. When you are finished answering the lab questions you may close Tableau
Desktop. Save your file as Lab 2-2 Slainte Pivot.twb.

Lab 2-2 Part 2 Objective Questions (LO 2-3)


OQ1. What was the total sales volume for Imperial Stout in January 2020?
OQ2. What was the total sales volume for all products in January 2020?
OQ3. Which product is experiencing the worst sales performance in January 2020?

Lab 2-2 Part 2 Analysis Questions (LO 2-3)


Now that you’ve completed a basic analysis to answer management’s question, take a
moment to think about how you could improve the report and anticipate questions your
manager might have.
AQ1. If the owner of Sláinte wishes to identify which product sold the most, how
would you make this report more useful?
AQ2. If you wanted to provide more detail, what other attributes would be useful to add as
additional rows or columns to your report, or what other reports would you create?
AQ3. Write a brief paragraph about how you would interpret the results of your analy-
sis in plain English. For example, which data points stand out?
AQ4. In Chapter 4, we’ll discuss some visualization techniques. Describe a way you
could present these data as a chart or graph.

Lab 2-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

ISTUDY
Lab 2-3 Resolve Common Data Problems—LendingClub
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: LendingClub is a peer-to-peer marketplace where borrowers and inves-
tors are matched together. The goal of LendingClub is to reduce the costs associated with
these banking transactions and make borrowing less expensive and investment more engag-
ing. LendingClub provides data on loans that have been approved and rejected since 2007,
including the assigned interest rate and type of loan. This provides several opportunities for
data analysis. There are several issues with this dataset that you have been asked to resolve
before you can process the data. This will require you to perform some cleaning, reformat-
ting, and other transformation techniques.
Data: Lab 2-3 Lending Club Approve Stats.zip - 120MB Zip / 120MB Excel

Lab 2-3 Example Output


By the end of this lab, you will clean data to prepare them for analysis. While your results
will include different data values, your work should look similar to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 2-3M Example of Cleaned Data in Microsoft Excel

84

ISTUDY
Tableau | Prep

Tableau Software, Inc. All rights reserved.


LAB 2-3T Example of Cleaned Data in Tableau Prep

Lab 2-3 Part 1 Identify Relevant Attributes


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-2 [Your name] [Your email address].docx.
You’ve already identified some analysis questions for LendingClub in Chapter 1. Here,
you’ll focus on data quality. Think about some of the common issues with data you receive
from other people. For example, is the date field in the proper format? Do number fields
contain text or vice versa?
The LendingClub collects different sets of data, including LoanStats for approved
loans and RejectStats for rejected loans. There are significantly more data available for
LoanStats. There are 145 different attributes. To save some time, we’ve identified 20 of the
most interesting in Lab Exhibit 2-3A.

Attribute Description
LAB EXHIBIT 2-3A
Source: LendingClub
loan_amnt Requested loan amount
term Length of the loan in months
int_rate Interest rate of the loan
grade Quality of the loan: e.g. A, B, C
emp_length Employment length

(continued)

ISTUDY
LAB EXHIBIT 2-3A Attribute Description
(concluded)
home_ownership Whether the borrower rents or owns a home
annual_inc Annual income
issue_d Date of loan issue
loan_status Fully paid or charged off
title Loan purpose
zip_code The first three digits of the applicant’s zip code
addr_state State
dti Debt-to-income ratio
delinq_2y Late payments within the past 2 years
earliest_cr_line Oldest credit account
open_acc Number of open credit accounts
revol_bal Total balance of all credit accounts
revol_util Percentage of available credit in use
total_acc Total number of credit accounts
application_type Individual or joint application

Lab 2-3 Part 1 Objective Questions (LO 2-3)


OQ1. Which attributes would you expect to contain date values?
OQ2. Which attributes would you expect to contain text values?
OQ3. Which attributes would you expect to contain numerical values?
OQ4. Which attribute most directly impacts a borrower’s cost of capital?

Lab 2-3 Part 1 Analysis Questions (LO 2-3)


AQ1. What do you expect will be major data quality issues with LendingClub’s data?
AQ2. Given this list of attributes, what types of questions do you think you could
answer regarding approved loans? (If you worked through Lab 1-2, what con-
cerns do you have with the data’s ability to predict answers to the questions you
identified in Chapter 1)?

Lab 2-3 Part 2 Transform and Clean the Data


Let’s identify some issues with the LendingClub data that will make analysis difficult:
• There are many attributes without any data, and that may not be necessary.
• The int_rate values may be recorded as percentages (##.##%), but analysis will require
decimals (#.####).
• The term values include the word months, which should be removed for numerical analysis.
• The emp_length values include “n/a”, “<”, “+”, “year”, and “years”—all of which
should be removed for numerical analysis.
• Dates cause issues in general because different systems use different date formats
(e.g., 1/9/2009, Jan-2009, 9/1/2009 for European dates, etc.), so typically some
­conversion is necessary.
Note: When you use Power Query or a Tableau Prep flow, you create a set of steps that
will be used to transform the data. When you receive new data, you can run those through
those same steps (or flows) without having to recreate them each time.

86

ISTUDY
Microsoft | Excel + Power Query

1. Open a new blank workbook in Excel.


2. In the Data ribbon, click Get Data > From File > From Workbook.
3. Locate the Lab 2-3 Lending Club Approve Stats.xlsx file on your com-
puter and click Import (this is a large file, so it may take a few minutes to
load).
4. Choose LoanStats3c and click Transform Data or Edit. Notice that all of the
column headers are incorrect.
First we have to fix the column headers and remove unwanted data.
5. In the Transform tab, click Use First Row as Headers to assign the correct
column titles.
6. Right-click the headers of any attribute that is not in the following list, and
click Remove. Hint: Once you get to initial_list_status, click that ­column
header then scroll to the right until you reach the end of the columns and
Shift + Click the last column (settlement_term). Then right-click and remove
columns.
a. loan_amnt
b. term
c. int_rate
d. grade
e. emp_length
f. home_ownership
g. annual_inc
h. issue_d
i. loan_status
j. title
k. zip_code
l. addr_state
m. dti
n. delinq_2y
o. earliest_cr_line
p. open_acc
q. revol_bal
r. revol_util
s. total_acc
7. Take a screenshot (label it 2-3MA) of your reduced columns.
Next, remove text values from numerical values and replace values so we
can do calculations and summarize the data. These extraneous text values
include months, <1, n/a, +, and years:

ISTUDY
8. Select the term column.
a. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “ months” with a space as the first
character (do not include the quotation marks).
2. Leave the Replace With box blank.
3. Click OK.
9. Select the emp_length column.
a. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “ years” with a space as the first
­character.
2. Leave the Replace With box blank.
3. Click OK.
b. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “ year” with a space as the first
­character.
2. Leave the Replace With box blank.
3. Click OK.
c. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “<1” with a space between the two
characters.
2. In the Replace With box, type “0”.
3. Click OK.
d. In the Transform tab, click Replace Values.
1. In the Value to Find box, type “n/a”.
2. In the Replace With box, type “0”.
3. Click OK.
e. In the Transform tab, click Extract > Text Before Delimiter.
1. In the Value to Find box, type “+”.
2. Click OK.
f. In the Transform tab, click Extract > Text Before Delimiter.
1. In the Value to Find box, type “ ” (a single space).
2. Click OK.
g. In the Transform tab, click Data Type > Whole Number.
10. Take a screenshot (label it 2-3MB) of your cleaned data file, showing the
term and emp_length columns.
11. Click the Home tab in the ribbon and then click Close & Load. It will take a
minute to clean the entire data file.
12. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 2-3 Lending Club Transform.xlsx.

88

ISTUDY
Tableau | Prep
Lab Note: Tableau Prep takes extra time to process large datasets.
1. Open Tableau Prep Builder.
2. Click Connect to Data > To a File > Microsoft Excel.
3. Locate the Lab 2-3 Lending Club Approve Stats.xlsx file on your computer
and click Open.
4. Drag LoanStats3c to your flow. Notice that all of the Field Names are
incorrect.
First we have to fix the column headers and remove unwanted data.
5. Check Use Data Interpreter in the pane on the left to automatically fix the
Field Names.
6. Uncheck the box next to any attribute that is NOT in the following list to
remove it from our analysis. Hint: Once you get to initial_list_status, all of
the remaining fields can be removed.
a.
loan_amnt
b.
term
c.
int_rate
d.
grade
e.
emp_length
f.
home_ownership
g.
annual_inc
h.
issue_d
i.
loan_status
j.
title
k.
zip_code
l.
addr_state
m.
dti
n.
delinq_2y
o.
earliest_cr_line
p.
open_acc
q.
revol_bal
r.
revol_util
s.
total_acc
7. Take a screenshot (label it 2-3TA) of your corrected and reduced list of
Field Names.
Next, remove text values from numerical values and replace values so we
can do calculations and summarize the data. These extraneous text values
include months, <1, n/a, +, and years:

ISTUDY
8. Click the + next to LoanStats3c in the flow and choose Add Clean Step. It
may take a minute or two to load.
9. An Input step will appear in the top half of the workspace, and the details of
that step are in the bottom of the workspace in the Input Pane. Every flow
requires at least one Input step at the beginning of the flow.
10. In the Input Pane, you can further limit which fields you bring into Tableau
Prep, as well as seeing details about each field including:
a. Type: this indicates the data type of each field (for example, numeric,
date, or short text).
b. Linked Keys: this indicates whether or not the field is a primary or a
foreign key.
c. Sample Values: provides a few example values from that field so you can
see how the data are formatted.
11. In the term pane:
a. Right-click the header or click the three dots and choose Clean > Remove
Letters.
b. Click the Data Type (Abc) button in the top-left corner and change the
data type to Number (Whole).
12. In the emp_length pane:
a. Right-click the header or click the three dots and choose Group Values >
Manual Selection.
1. Double-click <1 year in the list and type “0” to replace those values
with 0.
2. Double-click n/a in the list and type “0” to replace those values
with 0.
3. While you are in the Group Values window, you could quickly replace
all of the year values with single numbers (e.g., 10+ years becomes
“10”) or you can move to the next step to remove extra characters.
4. Click Done.
b. If you didn’t remove the “years” text in the previous step, right-click the
emp_length header or click the three dots and choose Clean > Remove
Letters and then Clean > Remove All Spaces.
c. Finally, click the Data Type (Abc) button in the top-left corner and
change the data type to Number (Whole).
13. In the flow pane, right-click Clean 1 and choose Rename and name the step
“Remove text”.
14. Take a screenshot (label it 2-3TB) of your cleaned data file, showing the
term and emp_length columns.
15. Click the + next to your Remove text task and choose Output.
16. In the Output pane, click Browse:
a. Navigate to your preferred location to save the file.
b. Name your file Lab 2-3 Lending Club Transform.hyper.
c. Click Accept.

90

ISTUDY
17. Click Run Flow. When it is finished processing, click Done.
18. When you are finished answering the lab questions you may close Tableau
Prep. Save your file as Lab 2-3 Lending Club Transform.tfl.

Lab 2-3 Part 2 Objective Questions (LO 2-3)


OQ1. How many records or rows appear in your cleaned dataset?
OQ2. How many attributes or columns appear in your cleaned dataset?

Lab 2-3 Part 2 Analysis Questions (LO 2-3)


AQ1. Why do you think it is important to remove text values from your data before
you conduct your analysis?
AQ2. What do you think would happen in your analysis if you didn’t remove the text
values?
AQ3. Did you run into any major issues when you attempted to clean the data?
How did you resolve those?
AQ4. What are some steps you could take to clean the data and resolve the difficulties
you identified?

Lab 2-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 2-4 Generate Summary Statistics—LendingClub


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: When you’re working with a new or unknown set of data, validating the
data is very important. When you make a data request, the IT manager who fills the request
should also provide some summary statistics that include the total number of records and
mathematical sums to ensure nothing has been lost in the transmission. This lab will help
you calculate summary statistics in Power BI and Tableau Desktop.
Data: Lab 2-4 Lending Club Transform.zip - 29MB Zip / 26MB Excel / 6MB Tableau

Lab 2-4 Example Output


By the end of this lab, you will explore summary statistics. While your results will include
different data values, your work should look similar to this:

ISTUDY
Microsoft | Power BI Desktop

Microsoft Excel
LAB 2-4M Example of Data Distributions in Microsoft Power Query

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 2-4T Example of Data Distributions in Tableau Desktop

Lab 2-4 Calculate Summary Statistics


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-4 [Your name] [Your email address].docx.

92

ISTUDY
In this part we are interested in understanding more about the loan amounts, interest
rates, and annual income by looking at their summary statistics. This process can be used
for data validation and later for outlier detection.

Microsoft | Power BI Desktop

Lab Note: These instructions can also be performed in the Power Query included
with Excel 365.

1. Open a new workbook in Power BI Desktop.


2. Click the Home tab in the ribbon and choose Get Data > Excel.
3. Navigate to your Lab 2-4 Lending Club Transform.xlsx file and click Open.
4. Check LoanStats3c and click Transform Data or Edit.
5. Click View in the ribbon, then check Column Distribution. A small frequency
distribution graph will appear at the top of each column.
6. Click the loan_amt column.
7. In the View tab, check Column Profile. You now see summary stats and a
frequency distribution for the selected column.
8. Note: To show profile for the entire data set instead of the top 1,000 ­values,
go to the bottom of the Power Query Editor window and click the title
­Column profiling based on top 1000 rows and change it to Column profiling
based on entire data set.
9. Take a screenshot (label it 2-4MA) of the column statistics and value
distribution.
10. Click the int_rate column and the annual_inc column, noting the count,
min, max, average, and standard deviation of each.
11. Click the drop-down next to the addr_state column and uncheck (Select
All).
12. Check PA and click OK to filter the loans.
13. In Power BI, click Home > Close & Apply. Note: You can always return to
Power Query by clicking the Transform button in the Home tab.
14. To show summary statistics in Power BI, go to the visualizations pane and
click Multi-row card.
a. Drag loan_amnt to the Fields box. Click the drop-down menu next to
loan_amnt and choose Sum.
b. Drag loan_amnt to the Fields box below the existing field. Click the
drop-down menu next to the new loan_amnt and choose Average.
c. Drag loan_amnt to the Fields box below the existing field. Click the
drop-down menu next to the new loan_amnt and choose Count.
d. Drag loan_amnt to the Fields box below the existing field. Click the
drop-down menu next to the new loan_amnt and choose Max.
15. Add two new Multi-row cards showing the same values (Sum, Average,
Count, Max) for int_rate and annual_inc.

ISTUDY
16. Take a screenshot (label it 2-4MB) of the column statistics and value
distribution.
17. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 2-4 Lending Club Summary.pbix.

Tableau | Desktop

1. Open Tableau Desktop.


2. Choose Connect > To a File > More...
3. Locate the Lab 2-4 Lending Club Transform.hyper and click Open.
4. Click the Sheet 1 tab.
5. Drag Addr State to the Filters shelf.
6. Click None, then check the box next to PA and click OK.
7. Click the drop-down on the Addr State filter and choose Apply to
Worksheets > All Using This Data Source.
8. Drag Loan Amnt to the Rows shelf.
9. To show each unique loan, you have to disable aggregate measures. From the
menu bar, click Analysis > Aggregate Measures to remove the check mark.
10. ­ orksheet > Show
To show the summary statistics, go to the menu bar and click W
Summary. A Summary card appears on the right side of the screen with the
Count, Sum, Average, Minimum, Maximum, and Median values.
11. Take a screenshot (label it 2-4TA).
12. Create two new sheets and repeat steps 5–7 for Int Rate and Annual Inc, not-
ing the count, sum, average, minimum, maximum, and median of each.
13. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-4 Lending Club Summary.twb.

Lab 2-4 Objective Questions (LO 2-3)


OQ1. What is the maximum loan amount that was approved for borrowers from PA?
OQ2. What is the average interest rate assigned to a loan to an approved borrower
from PA?
OQ3. What is the average annual income of an approved borrower from PA?

Lab 2-4 Analysis Questions (LO 2-3)


AQ1. Compare the loan amounts to the validation given by LendingClub for borrow-
ers from PA: Funded loans: $123,262.53 Number of approved loans: 8,427 Do the
numbers in your analysis match the numbers provided by LendingClub? What
explains the discrepancy, if any?
AQ2. Does the Numerical Count provide a more useful/accurate value for validating
your data? Why or why not do you think that is the case?
AQ3. Compare and contrast: Why do Power Query and Tableau Desktop return
­different values for their summary statistics?

94

ISTUDY
AQ4. Compare and contrast: What are some of the summary statistics measures that
are unique to Power Query? To Tableau Desktop?

Lab 2-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 2-5 Validate and Transform Data—College Scorecard


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Your college admissions department is interested in determining the
likelihood that a new student will complete their 4-year program. They have tasked you with
analyzing data from the U.S. Department of Education to identify some variables that may
be predictive of the completion rate. The data used in this lab are a subset of the College
Scorecard dataset that is provided by the U.S. Department of Education. These data provide
federal financial aid and earnings information, insights into the performance of schools eli-
gible to receive federal financial aid, and the outcomes of students at those schools.
Data: Lab 2-5 College Scorecard Dataset.zip - 0.5MB Zip / 1.4MB Txt

Lab 2-5 Example Output


By the end of this lab, you will have validated and transformed the College Scorecard data.
While your results will include different data values, your work should look similar to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 2-5M Example of Cleaned College Scorecard Data in Microsoft Excel

ISTUDY
Tableau | Prep + Desktop

Tableau Software, Inc. All rights reserved.


LAB 2-5T Example of Cleaned College Scorecard Data in Tableau Prep

Lab 2-5 Load and Clean Data


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-5 [Your name] [Your email address].docx.
Working with raw data can present interesting challenges, especially when it comes to
identifying attributes and data types. In this lab you will learn how to transform the raw data
into data models that are ready for analysis.

Microsoft | Excel + Power Query

1. Open a new blank spreadsheet in Excel.


2. From the Data tab in the ribbon, click Get Data > From File > From Text/CSV.
3. Navigate to your Lab 2-5 College Scorecard Dataset.txt file and click Open.
4. Verify that the data loaded correctly into tables and rows and then click
Transform Data or Edit.
5. Click through each of the 30 columns and from the Transform tab in the
ribbon, click Data Type > Whole Number or Data Type > Decimal Number
where appropriate. If prompted, click Replace Current. Because the origi-
nal text file replaced empty values with “NULL”, Power Query erroneously
detected many of the columns as Text. Hint: Hold the Ctrl key and click to
select multiple columns.

96

ISTUDY
6. Take a screenshot (label it 2-5MA) of your columns with the proper data
types.
7. From the Home tab, click Close & Load.
8. To ensure that you captured all of the data through the extraction from the
txt file, we need to validate them:
a. In the Queries & Connections pane, verify that there are 7,703 rows loaded.
b. Compare the attribute names (column headers) to the attributes listed
in the data dictionary (found in Appendix K of the textbook). There
should be 30 columns (the last column in Excel should be AD).
c. Click Column H for the SAT_AVG attribute. In the summary statistics at
the bottom of your worksheet, the overall average SAT score should be
1,059.07.
9. Take a screenshot (label it 2-5MB) of your data table in Excel.
10. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 2-5 College Scorecard Transform.xlsx. Your data are
now ready for the test plan. This lab will continue in Lab 3-3.

Tableau | Prep + Desktop

1. Open a new flow in Tableau Prep Builder.


2. Click Connect to Data > To a File > Text file.
3. Navigate to your Lab 2-5 College Scorecard Dataset.txt file and click Open.
4. Verify that the data types for each of the 30 attributes is detected as a Num-
ber with the exception of INSTNM, CITY, and STABBR.
5. Take a screenshot (label it 2-5TA).
6. In the flow, click the + next to your Lab 2-5 College Scorecard Dataset and
choose Add > Clean Step.
7. Review the data and click the lightbulb icon in the CITY and STABBR attri-
butes to change the data roles to City and State/Province, respectively.
8. Click the + next to your Clean 1 task and choose Output.
9. In the Output pane, click Browse:
a. Navigate to your preferred location to save the file.
b. Name your file Lab 2-5 College Scorecard Transform.hyper.
c. Click Accept.
10. Click Run Flow. When it is finished processing, click Done.
11. Close Tableau Prep Builder. Save your file as Lab 2-5 College Scorecard
Transform.tfl.
12. Open Tableau Desktop.
13. Choose Connect > To a File > More...
14. Locate the Lab 2-5 College Scorecard Transform.hyper and click Open.
15. Click the Sheet 1 tab.

ISTUDY
Rev. Confirming Pages

16. From the menu bar, click Analysis > Aggregate Measures to remove the check
mark. To show each unique entry, you have to disable aggregate measures.
17. To show the summary statistics, go to the menu bar and click Worksheet >
Show Summary. A Summary card appears on the right side of the screen
with the Count, Sum, Average, Minimum, Maximum, and Median values.
18. Drag Unitid to the Rows shelf and note the summary statistics.
19. Take a screenshot (label it 2-5TB) of the Unitid stats in your worksheet.
20. Create two new sheets and repeat steps 16–18 for Sat Avg and C150 4, not-
ing the count, sum, average, minimum, maximum, and median of each.
21. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-5 College Scorecard Transform.twb. Your
data are now ready for the test plan. This lab will continue in Lab 3-3.

Lab 2-5 Objective Questions (LO 2-3)


OQ1. How many schools report average SAT scores?
OQ2. What is the average completion rate (C150 4) of all the schools?
OQ3. How many schools report data to the U.S. Department of Education?

Lab 2-5 Analysis Questions (LO 2-3)


AQ1. In the checksums, you validated that the average SAT score for all of the records
is 1,059.07. When we work with the data more rigorously, several tests will
require us to transform NULL or blank values. If you were to transform the
NULL SAT values into 0, what would happen to the average (would it stay the
same, decrease, or increase)?
AQ2. How would that change to the average affect the way you would interpret the data?
AQ3. What would happen if we excluded all schools that don’t report an average
SAT score?

Lab 2-5 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 2-6  omprehensive Case: Build Relationships among


C
Database Tables—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if ­applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
­Dillard’s account. You have reviewed the ER Diagram (available in Additional Student
Resources), but you still aren’t sure what all of the different tables and fields represent.
Before diving into ­problem solving or even transforming the data to prepare them for analy-
sis, it is important to gain an understanding of what data are available to you. One of the
steps in doing so is connecting to the database and analyzing the way the tables relate.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
­Desktop (waltonlab.uark.edu). See your instructor for login credentials.

98

ISTUDY ric44907_ch02_052-113.indd 98 02/24/23 11:36 AM


Lab 2-6 Example Output
By the end of this lab, you will explore how to define relationships between tables from
Dillard’s sales data. While your results will include different data values, your work should
look similar to this:

Microsoft | Power BI Desktop

Microsoft Excel
LAB 2-6M Example of Dillard’s Data Model in Microsoft Power BI

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 2-6T Example of Dillard’s Data Model in Tableau Desktop

ISTUDY
Lab 2-6 Build Relationships between Tables
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-6 [Your name] [Your email address].docx.
Before you can analyze the data, you must first define the relationships that show how
the different tables are connected. Most tools will automatically detect primary key–foreign
key relationships, but you should always double-check to make sure your data model is
accurate.

Microsoft | Power BI Desktop

1. Open Power BI Desktop.


2. In the Home ribbon, click Get Data > SQL Server.
3. Enter the following and click OK (keep in mind that SQL Server is not just
one database, it is a collection of databases, so it is critical to indicate the
server path and the specific database):
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Data Connectivity: DirectQuery
4. If prompted to enter credentials, you can keep the default to “Use my current
credentials” and click Connect.
5. If prompted with an Encryption Support warning, click OK to move past it.
6. Take a screenshot (label it 2-6MA) of the navigator window.
Learn about Power BI!
There are two ways to connect to data, either Import or DirectQuery. There
are pros and cons for each, and it will always depend on a few factors, includ-
ing the size of the dataset and the type of analysis you intend to do.
Import: Will pull in all data at once. This can take a long time, but once they
are imported, your analysis can be more efficient if you know that you plan
to use each piece of data that you import. This is also beneficial for some of
the analyses you will learn about in future chapters, such as clustering.
DirectQuery: Only creates a connection to the data. This is more efficient
if you are exploring all of the tables in a large database and are comfort-
able working with only a sample of data. Note: Unless directed otherwise, you
should always use DirectQuery with Dillard’s data to prevent the remote desktop
from running out of storage space.
7. Place a check mark next to each of the following tables and click Load:
a. Customer, Department, SKU, SKU_Store, Store, Transact
8. Click the Model button (the icon with three connected boxes) in the toolbar
on the left to view the tables and relationships and note the following:
a. All the tables that you selected should appear in the Modeling tab with
table names, attributes, and relationships.

100

ISTUDY
b. When you hover over any of the relationships, the keys that are common
between the two tables highlight.
1. Something important to consider is that in the raw data, the primary
key is typically the first attribute listed. In this Power BI modeling
window, the attributes have been re-ordered to appear in alphabetical
order. For example, SKU is the primary key of the SKU table, and it
exists in the Transact table as a foreign key.
9. Take a screenshot (label it 2-6MB) of the All tables sheet.
10. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 2-6 Dillard’s Diagram.pbix.
Note: While it may seem easier and faster to rely on the automatically created
data model in Power BI, you should review the table relationships to make sure
the appropriate keys match.

Tableau | Desktop

1. Open Tableau Desktop.


2. Go to Connect > To a Server > Microsoft SQL Server.
3. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
4. Take a screenshot (label it 2-6TA) of the blank Data Source tab.
In Tableau, you connect to individual tables and build relationships one at a
time. We will build each relationship, but we need to start with one table.
5. Double-click the TRANSACT table from the list on the left to add it to the
top pane.
a. The field names will appear in the data grid section on the bottom of the
screen, but the data themselves will not automatically load. If you click
Update Now, you can get a preview of the data held in the Transact table.
You can do some light data transformation at this point, but if your
data transformation needs are heavy, it would be better to perform that
transformation in Tableau Prep before bringing the data into Tableau
Desktop.
6. Double-click the CUSTOMER table to add it to the data model in the top pane.
a. In the Edit Relationship window that pops up, confirm that the appro-
priate keys are identified (Cust ID and Cust ID) and close the window.
7. Double-click each of the remaining tables that relate directly to the Transact
table from the list on the left:
a. SKU, STORE, DEPARTMENT
b. Note that DEPARTMENT will join with the SKU table, not
TRANSACT.

ISTUDY
8. Finally, double-click the SKU_STORE table from the list on the left.
a. The SKU_Store table is related to both the SKU and the Store tables,
but Tableau will likely default to connecting it to the Transact table,
resulting in a broken relationship.
b. To fix the relationship,
1. Close the Edit Relationships window without making changes.
2. Right-click SKU_STORE in the top pane and choose Move to >
SKU.
3. Verify the related keys and close the Edit Relationships window.
4. Note: It is not necessary to also relate the SKU_Store table to the
Store table in Tableau; that is only a database requirement.
9. Take a screenshot (label it 2-6TB) of the Data Source tab.
10. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-6 Dillard’s Diagram.twb.

Lab 2-6 Objective Questions (LO 2-2)


OQ1. How many tables relate directly to the TRANSACT table?
OQ2. Which table does the DEPARTMENT table relate to?
OQ3. What is the name of the key that relates the TRANSACT and CUSTOMER
tables?

Lab 2-6 Analysis Questions (LO 2-2)


AQ1. How would a view of the entire database or certain tables out of that database
allow us to get a feel for the data?
AQ2. What types of data would you guess that Dillard’s, a retail store, gathers that
might be useful beyond the scope of the sales data available on the remote
desktop? How could Dillard’s suppliers use these data to predict future
purchases?
AQ3. Compare and Contrast: Compare the methods for connecting to data in Tableau
versus Power BI. Which is more intuitive?
AQ4. Compare and Contrast: Compare the methods for viewing (and creating) rela-
tionships in Tableau versus Power BI. Which is easier to work with? Which pro-
vides more insight and flexibility?

Lab 2-6 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

102

ISTUDY
Lab 2-7  omprehensive Case: Preview Data from
C
Tables—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. After analyzing the ER Diagram to gain a bird’s-eye view of all the differ-
ent tables and fields in the database, you are ready to further explore the data in each table
and how the fields are formatted. In particular, you will connect to the Dillard’s database
using Tableau Prep or Microsoft BI and you will explore the data types, the primary and
foreign keys, and preview individual tables.
In Lab 2-6, the Tableau Track had you focus on Tableau Desktop. In this lab, you
will connect to Tableau Prep instead. Tableau Desktop showcases the table relationships
quicker, but Tableau Prep makes it easier to preview and clean the data prior to analysis.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 2-7 Example Output


By the end of this lab, you will explore Dillard’s data and generate summary statistics. While
your results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Excel
LAB 2-7M Example of Summary Statistics in Microsoft Power BI

ISTUDY
Tableau | Prep

Tableau Software, Inc. All rights reserved.


LAB 2-7T Example of Summary Statistics in Tableau Prep

Lab 2-7 Part 1 Preview Dillard’s Attributes and


Data Types
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-7 [Your name] [Your email address].docx.
In this part of the lab you will load the data and explore the available attributes and verify
that the data types are correctly assigned.

Microsoft | Power BI Desktop

1. Create a new project in Power BI Desktop.


2. In the Home ribbon, click Get Data > SQL Server database.
3. Enter the following and click OK(keep in mind that SQL Server is not just
one database, it is a collection of databases, so it is critical to indicate the
server path and the specific database):
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
c. Data Connectivity: DirectQuery
4. If prompted to enter credentials, keep the default of “Use my current creden-
tials” and click Connect.
5. If prompted with an Encryption Support warning, click OK to move past it.

104

ISTUDY
6. In the Navigator window, click the TRANSACT table.
a. It may take a few moments for the data to load, but once they do, you
will see a preview of the data to the right. Scroll through the data to get
a feel for the data stored in the Transact table.
b. Unlike Tableau Prep, we cannot limit which attributes we pull in at this
point in the process. However, the preview pane to the right shows the
first 27 records in the table that you have selected. This gives you an idea
of the different data types and examples of the data stored in the table.
c. Scroll to the right in the preview pane to see several fields that do not
have preview data; instead each record is marked with the term Value.
These fields are indicating the tables that are related to the Trans-
act table. In this instance, they are CUSTOMER, SKU(SKU), and
STORE(STORE).
7. Click the CUSTOMER table to preview that table’s data.
8. Take a screenshot (label it 2-7MA).
9. Answer the lab questions and continue to Part 2.

Tableau | Prep

1. Create a new flow in Tableau Prep.


2. Click Connect to Data.
3. Choose Microsoft SQL Server in the Connect list.
4. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
5. Double-click TRANSACT to add the table to your flow.
a. An Input step will appear in the top half of the workspace, and the
details of that step are in the bottom of the workspace in the Input Pane.
Every flow requires at least one Input step at the beginning of the flow.
b. In the Input Pane, you can further limit which fields you bring into Tab-
leau Prep, as well as seeing details about each field including:
1. Type: this indicates the data type of each field (for example, numeric,
date, or short text).
2. Linked Keys: this indicates whether or not the field is a primary or a
foreign key. In the Transact table, we can see that the Transaction_ID
is the primary key, and that there are three foreign keys in this table:
Store, Cust_ID, and SKU.
3. Sample Values: provides a few example values from that field so you
can see how the data are formatted.
6. Double-click the CUSTOMER table to add a new Input step to your flow.
7. Take a screenshot (label it 2-7TA).
8. Answer the lab questions and continue to Part 2.

ISTUDY
Lab 2-7 Part 1 Objective Questions (LO 2-2, 2-3)
OQ1. What is the primary key for the CUSTOMER table?
OQ2. What is the primary key for the SKU table?
OQ3. Which tables are related to the Customer table? (Hint: Do not forget the foreign
keys that you discovered in the Transact table.)

Lab 2-7 Part 2 Explore Dillard’s Data More In-Depth


In this part of the lab, you will explore the summary statistics to understand the properties
of the different attributes, such as the count and average.

Microsoft | Power BI Desktop

1. Place a check mark in the TRANSACT and CUSTOMER tables in the Navi-
gator window.
2. Click Transform Data.
a. This will open a new window for the Power Query Editor (this is the
same interface that you will encounter in Excel’s Get & Transform).
b. On the left side of the Power Query Editor, you can click through the
different queries to see previews of each table’s data. Similar to Tableau
Prep, you are provided only a sample of the dataset.
c. Click the Transact query to preview the data from the Transact table.
d. Scroll the main view to the right to see more of the attributes.
3. Power Query does not default to providing data profiling information the
way Tableau Prep’s Clean step does, but we can activate those options.
4. Click the View tab and place check marks in the Column Distribution and
Column Profile boxes.
a. Column distribution: Provides thumbnails of each column’s distribution
above the first row of data. However, it is limited to only the thumbnail—
you cannot hover over bars in the distribution charts to gain additional
details or filter the data.
b. Column profile: When you select a column, it will provide a more
detailed glimpse into the distribution of that particular column. You can
click into a bar in the distribution to filter the records based on that cri-
teria. This will also cause the Column distribution thumbnails to adjust.
c. Again—caution! The distributions and profiling are based on the top
1,000 rows from the table you have connected to.
5. Some of the attributes are straightforward in what they represent, but others
aren’t as clear. For instance, you may be curious about what TRAN_TYPE
represents.
6. Filter the purchases and returns:
a. Click the drop-down button next to Tran_Type and filter for just the
records with P values or click the bar associated with the P values in the

106

ISTUDY
Column profile. Scroll over and look at the results in the Tran_Amt field
and note whether they are positive or negative.
b. Now adjust the filter so that you see only R Tran_Types. Note the values
in the Tran_Amt field again.
7. Take a screenshot (label it 2-7MB).
8. When you are finished answering the lab questions, you may close the Power
Query Editor and Power BI Desktop. Save your file as Lab 2-7 Dillard’s
Data.pbix.

Tableau | Prep

1. Add a new Clean step extending from the TRANSACT table (click the +
icon next to TRANSACT and choose Clean Step from the menu). A phan-
tom step for View and Clean may already exist. If so, just click that step to
add it:
a. The Clean step provides many different options for preparing your data,
which we will get to in future labs. In this lab, you will use it as a means
for familiarizing yourself with the dataset.
b. Beneath the Flow Pane, you can see two new panes: the Profile Pane and
the Data Grid.
1. The Data Grid provides a more robust sample of data values than you
were able to see in the Input Pane from the Input step.
2. The Profile Pane provides summary visualizations of each attribute
in the table. Note: When datasets are large, these summary values are
calculated only from the first several thousand records in the original
table, so be cautious about using these visualizations to drive insights!
In this instance, we can see a good example of this being merely a
sample by looking at the TRAN_DATE visual summary. It shows
only dates from 12/30/2013 to 01/27/2014, but we know the dataset
has transactions through 2016.
c. Some of the attributes are straightforward in what they represent, but
others aren’t as clear. For instance, you may be curious about what
TRAN_TYPE represents. Look at the data visualization provided for
TRAN_TYPE in the Profile Pane and click P. This will filter the results
in the Data Grid.
1. Look at the results in the TRAN_AMT field and note whether they
are positive or negative (you can do so by looking at the data grid or
by looking at the filtered visualization for TRAN_AMT).
2. Adjust the filter so that you see only R transaction types. Note the
values in the Tran_Amt field again.
2. Take a screenshot (label it 2-7TB).
3. When you are finished answering the lab questions, you may close Tableau
Prep. Save your file as Lab 2-7 Dillard’s Data.tfl.

ISTUDY
Lab 2-7 Part 2 Objective Questions (LO 2-2, 2-3)
OQ1. What do you notice about the TRAN_AMT for transactions with
TRAN_TYPE “P”?
OQ2. What do you notice about the TRAN_AMT for transactions with
TRAN_TYPE “R”?
OQ3. What do “P” type transactions and “R” type transactions represent?

Lab 2-7 Part 2 Analysis Questions (LO 2-2, 2-3)


AQ1. Compare and Contrast: Compare the methods for previewing data in the tables
in Tableau Prep versus Microsoft Power BI. Which method is easier to interact
with?
AQ2. Compare and Contrast: Compare the methods for identifying data types in each
table in Tableau Prep versus Microsoft Power BI. Which method is easier to
interact with?
AQ3. Compare and Contrast: Compare viewing the data distribution and filtering data
in Tableau Prep’s Clean step to Microsoft Power BI’s Data Profiling options.
Which method is easier to interact with?

Lab 2-7 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 2-8 C
 omprehensive Case: Preview a Subset of Data in
Excel, Tableau Using a SQL Query—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. So far you have analyzed the ER Diagram to gain a bird’s-eye view of all
the different tables and fields in the database, and you have explored the data in each table
to gain a glimpse of sample values from each field and how they are all formatted. You also
gained a little insight into the distribution of sample values across each field, but at this
point you are ready to dig into the data a bit more. In the previous comprehensive labs, you
connected to full tables in Tableau or Power BI to explore the data. In this lab, instead of
connecting to full tables, we will write a SQL query to pull only a subset of data into Tableau
or Excel. This tactic is more effective when the database is very large and you can derive
insights from a sample of the data. We will analyze 5 days’ worth of transaction data from
September 2016. In this lab we will look at the distribution of transactions across different
states in order to get to know our data a little better.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
­Desktop (waltonlab.uark.edu). See your instructor for login credentials.

Lab 2-8 Example Output


By the end of this lab, you will create a dashboard that will let you explore financial senti-
ment along many dimensions for SP100 companies. While your results will include differ-
ent data values, your work should look similar to this:

108

ISTUDY
Microsoft | Excel + Power Query

Microsoft Excel
LAB 2-8M Example of a PivotTable and PivotChart Microsoft Power BI

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 2-8T Example of Summary Chart in Tableau Desktop

Lab 2-8 Part 1 Connect to the Data with a SQL Query


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-8 [Your name] [Your email address].docx.

ISTUDY
Microsoft | Excel + Power Query

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT TRANSACT.*, STATE
FROM TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
d. Click Connect. Click OK if prompted about encryption.
e. Click Edit.
3. Take a screenshot (label it 2-8MA).
4. Click Close & Load > Close & Load To.
5. Choose Only Create Connection and check the box next to Add this data to
the Data Model, then click OK.

Tableau | Desktop

1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT TRANSACT.*, STATE
FROM TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
e. Click Preview Results. . . to test your query on a sample data set.
f. If everything looks good, close the preview and click OK.
3. Take a screenshot (label it 2-8TA).
4. Click Sheet 1.

110

ISTUDY
Lab 2-8 Part 2 View the Distribution of Transaction
Amounts across States
In addition to data from the Transact table, our query also pulled in the attribute State from the
Store table. We can use this attribute to identify the sum of transaction amounts across states.

Microsoft | Excel + Power Query

1. We will perform this analysis using a PivotTable. Return to the worksheet in


your Excel workbook titled Sheet1.
2. From the Insert tab in the ribbon, click PivotTable.
3. Check Use this workbook’s Data Model and click OK.
4. Expand the Query1 and place check marks next to TRAN_AMT and STATE.
5. The TRAN_AMT default aggregation will likely be SUM. Change it by
right-clicking one of the TRAN_AMT values in the PivotTable, selecting
Summarize Values By > Average.
6. To make this output easier to interpret, you can sort the data so that you see
the states that have the highest average transaction amount first. To do so,
have your active cell anywhere in the Average of TRAN_AMT column, right-
click the cell, select Sort, then select Sort Largest to Smallest.
7. To view a visualization of these results, click the PivotTable Analyze tab in the
ribbon and click PivotChart.
8. The default will be a column chart, which is great for visualizing these data.
Click OK.
9. Take a screenshot (label it 2-8MB) of your PivotTable and PivotChart and
click OK.
10. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 2-8 Dillard’s Stats.xlsx.

Tableau | Desktop

1. Within the same Tableau workbook open a new sheet.


2. Add the TRAN_AMT field to the worksheet by double-clicking it.
3. Add the STATE field to the worksheet by double-clicking it.
4. This result shows you the sum of the transaction amounts across each dif-
ferent state, but it will be more meaningful to compare average transaction
amount. Click the drop-down arrow on the SUM(TRAN_AMT) pill in the
rows shelf and change Measure (Sum) to Average.
5. It will be easier to analyze the different averages quickly if you sort the data.
Click the icon for Sort Descending.
6. Take a screenshot (label it 2-8TB).
7. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-8 Dillard’s Stats.twb.

ISTUDY
Lab 2-8 Part 2 Objective Questions
OQ1. Which state has the highest average transaction amount?
OQ2. What is the average of transactions for North Carolina? Round your answer to
the nearest dollar.

Lab 2-8 Part 2 Analysis Questions


AQ1. How does creating a query to connect to the data allow quicker and more effi-
cient access and analysis of the data than connecting to entire tables?
AQ2. Is 5 days of data sufficient to capture the statistical relationship among and
between different variables? What will Excel do if you have more than 1 million
rows? How might a query help?
If you have completed BOTH tracks,
AQ3. Compare and Contrast: Compare the methods for analyzing transactions across
states in Excel versus Tableau. Which tool was more intuitive for you to work
with? Which provides more interesting results?

Lab 2-8 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

112

ISTUDY
ISTUDY
Chapter 3
Performing the Test Plan and
Analyzing the Results

A Look at This Chapter


Data Analytics involves the use of various models and techniques to understand the environment (descriptive analytics),
make comparisons (diagnostic analytics), predict the future (predictive analytics), and determine the best course of
direction for the future (prescriptive analytics). In this chapter, we evaluate several different approaches and models
and identify when to use them and how to interpret the results. We also provide specific accounting-related examples
of when each of these specific data approaches and models is appropriate to address our particular question.

A Look Back
Chapter 2 provided a description of how data are prepared, scrubbed, and made ready to use to answer business
questions. We explained how to extract, transform, and load data and then how to validate and normalize the data.
In addition, we explained how data standards are used to facilitate the exchange of data between both senders and
receivers. We also emphasized the ethical importance of maintaining privacy in both the collection and use of data.

A Look Ahead
Chapter 4 will demonstrate various techniques that can be used to effectively communicate the results of your
­analyses, emphasizing the use of visualizations. Additionally, we discuss how to refine your results and translate
your findings into useful information for decision makers.

114

ISTUDY
Liang Zhao Zhang, a San Francisco–based janitor, made more
than $275,000 in 2015. The average janitor in the area earns just
$26,180 a year. Zhang, a Bay Area Rapid Transit (BART) janitor,
has a base pay of $57,945 and $162,050 in overtime pay. With
benefits, the total was $276,121. While some call his compensa-
tion “outrageous and irresponsible,” Zhang signed up for every
available overtime slot that became available. To be sure, Zhang
worked more than 4,000 hours last year and received overtime
pay. Can BART predict who might take advantage of overtime
pay? Should it set a policy restricting overtime pay? Would it be
better for BART to hire more regular, full-time employees instead
jaruek/123RF of offering so much overtime to current employees?
Can Data Analytics help somehow to address these questions?
Using a profiling Data Analytics approach detailed in this chapter, BART could generate summary statistics of its
workers and their overtime pay to see the extent that overtime is required.
Using regression and classification approaches to Data Analytics would help to classify which employees are most
likely to exceed normal bounds and why. BART, for example, has a policy of offering overtime by seniority. So do the
most senior employees sign up first and leave little overtime to others? Will a senior employee get paid more for over-
time than more junior-level employees? If so, is that the best policy for the company and its employees?

Source: http://www.cnbc.com/2016/11/04/how-one-bay-area-janitor-made-276000-last-year.xhtml.

OBJECTIVES
After reading this chapter, you should be able to:

LO 3-1 Understand and distinguish among the four types of Data Analytics in
performing the test plan.
LO 3-2 Explain several descriptive analytics approaches, including summary
statistics and data reduction, and how they summarize results.
LO 3-3 Explain the diagnostic approach to Data Analytics, including profiling
and clustering.
LO 3-4 Understand the techniques associated with predictive analytics,
including regression and classification.
LO 3-5 Describe the use of prescriptive analytics, including decision support
systems, machine learning, and artificial intelligence.

ISTUDY
116 Chapter 3 Performing the Test Plan and Analyzing the Results

LO 3-1 PERFORMING THE TEST PLAN


Understand and The third step of the IMPACT cycle model, or the “P,” is “performing the test plan.” In this
distinguish among step, different Data Analytics approaches help us understand what happened, why it hap-
the four types of pened, what we can expect to happen in the future, and what we should do based on what we
Data Analytics in expect will happen. These Data Analytics approaches or techniques help to address our busi-
performing the ness questions and provide information to support accounting and management decisions.
test plan. Data Analytics approaches rely on a series of tasks and models that are used to under-
stand data and gain insight into the underlying cause and effect of business decisions. Many
accounting courses introduce students to basic models that describe the results of peri-
odic transactions (e.g., ratios, trends, and variance analysis). These simple calculations help
accountants fulfill their traditional role as historians summarizing the results from the past
to inform stakeholders of the status of the business.
While these simple techniques provide important information, their value is limited to
providing information in hindsight. The contributing value of Data Analytics increases as
the focus shifts from hindsight to foresight and from summarizing information to optimiz-
ing business outcomes as we go from descriptive analytics to prescriptive analytics (as illus-
trated in Exhibit 3-1). For example, lean accounting relies more heavily on data analysis to
accurately predict changes in budgets and forecasts to minimize disruption to the business.
These models that more accurately predict the future and prescribe a course of action come
at a cost of increasing complexity in terms of manipulating and calculating appropriate
data, and the implications of the results.
There are four main types of Data Analytics, shown in Exhibit 3-1:
• Descriptive analytics are procedures that summarize existing data to determine what has
happened in the past. Some examples of descriptive analytics include summary statistics
(e.g., count, min, max, average, median, standard deviation), distributions, and proportions.
• Diagnostic analytics are procedures that explore the current data to determine why
something has happened the way it has, typically comparing the data to a benchmark.
As an example, diagnostic analytics allow users to drill down in the data and see how
they compare to a budget, a competitor, or trend.
• Predictive analytics are procedures used to generate a model that can be used to deter-
mine what is likely to happen in the future. Examples of predictive analytics include
regression analysis, forecasting, classification, and other predictive modeling.

EXHIBIT 3-1
Four Main Types of
Data Analytics

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 117

• Prescriptive analytics are procedures that work to identify the best possible options
given constraints or changing conditions. These typically include developing more
advanced machine learning and artificial intelligence models to recommend a course of
action, or optimize, based on constraints and/or changing conditions.
The choice of Data Analytics model depends largely on the type of question that you’re
trying to answer and your access to the data needed to answer the question. Descriptive
and diagnostic analytics are typically paired when you would want to describe the past data
and then compare them to a benchmark to determine why the results are the way they are,
similar to the accounting concepts of planning and controlling. Likewise, predictive and
prescriptive analytics make good partners when you would want to predict an outcome and
then make a recommendation on how to follow up, similar to an auditor flagging a transac-
tion as high risk and then following a decision flowchart to determine whether to request
additional evidence or include it in audit findings.
As you move from one Data Analytics approach to the next, you trade hindsight and infor-
mation, which are traditionally accounting domain areas, for foresight and optimization.
Ultimately, the model you use comes down to the questions you are trying to answer. We
highlighted the Data Analytics approaches in Chapter 1. Here we categorize them into the
four main analytics types, summarized in Exhibit 3-2:
1. Descriptive analytics:
• Summary statistics describe a set of data in terms of their location (mean, median), range
(standard deviation, minimum, maximum), shape (quartile), and size (count).
• Data reduction or filtering is used to reduce the number of observations to focus on relevant
items (e.g., highest cost, highest risk, largest impact, etc.). It does this by taking a large set of

Type of Analytic Example in Accounting EXHIBIT 3-2


Summary of Data
Descriptive analytics Understand what happened. Analytics Approaches
Summary statistics Calculate the average and medium income, age range, and highest and lowest
purchases of customers during the fourth quarter.
Data reduction or filtering Filter data to include only transactions within the current reporting period.
Diagnostic analytics Understand why it happened.
Profiling Identify outlier transactions to determine exposure and risk. Compare individual
segments to a benchmark.
Clustering Identify groups of store locations that are outperforming the rest.
Similarity matching Understand the underlying behavior of high-performing divisions.
Co-occurrence grouping Identify related party and intracompany transactions based on the individuals
involved in a transaction.
Predictive analytics Estimate a future value or category.
Regression Calculate the fixed and variable costs in a mixed cost equation or determine the
number of days a shipment is likely to take.
Classification Determine whether or not certain demographics, such as age, zip code, income
level, or gender, are likely to engage in fraudulent transactions.
Link prediction Predict participation of individuals based on their underlying common
attributes, such as an incentive.
Prescriptive analytics Make recommendations for a course of action.
Decision support systems Tax software takes input from a preparer and recommends whether to take a
standard deduction or an itemized deduction.
Artificial intelligence Audit analysis monitors changing transactions and suggests follow-up when new
abnormal patterns appear.

ISTUDY
118 Chapter 3 Performing the Test Plan and Analyzing the Results

data (perhaps the population) and reducing it to a smaller set that has the vast majority of the
critical information of the larger set. For example, auditing may use data reduction to narrow
transactions based on relevance or size. While auditing has employed various random and
stratified sampling over the years, Data Analytics suggests new ways to highlight which trans-
actions do not need the same level of vetting as other transactions.
2. Diagnostic analytics:
• Profiling identifies the “typical” behavior of an individual, group, or population by compiling
summary statistics about the data (including mean, standard deviations, etc.) and comparing
individuals to the population. By understanding the typical behavior, we’ll be able to identify ab-
normal behavior more easily. Profiling might be used in accounting to identify transactions that
might warrant some additional investigation (e.g., outlier travel expenses or potential fraud).
• Clustering helps identify groups (or clusters) of individuals (such as customers) that share
common underlying characteristics—in other words, identifying groups of similar data ele-
ments and the underlying drivers of those groups. For example, clustering might be used to
segment a customer into a small number of groups for additional analysis and risk assessment.
Likewise, transactions might also be put into clusters to understand underlying relationships.
• Similarity matching is a grouping technique used to identify similar individuals based on data
known about them. For example, companies identify seller and customer fraud based on
­various characteristics known about each seller and customer to see if they were similar to
known fraud cases.
• Co-occurrence grouping discovers associations between individuals based on common events,
such as transactions they are involved in. Amazon might use this to sell another item to you
by knowing what items are “frequently bought together” or “Customers who bought this item
also bought . . .” as shown in Chapter 1.
3. Predictive analytics:
• Regression estimates or predicts the numerical value of a dependent variable based on the
slope and intersect of a line and the value of an independent variable. An R2 value indicates
how closely the line fits to the data used to calculate the regression. An example of regression
analysis might be: given a balance of total accounts receivable held by a firm, what is the ap-
propriate level of allowance for doubtful accounts for bad debts?
• Classification predicts a class or category for a new observation based on the manual identi-
fication of classes from previous observations. Membership of a class may be binary in the
case of decision trees or indicate the distance from a decision boundary. Some examples of
classification include predicting which loans are likely to default, credit applications that are
expected to be approved, the classification of an operating or financing lease, or identification
of suspicious transactions. In each of these cases, prior data must be manually identified as
belonging to each class to build the predictive model.
• Link prediction predicts a relationship between two data items, such as members of a social
media platform. For example, if two individuals have mutual friends on social media and both
attended the same university, it is likely that they know each other, and the site may make a
recommendation for them to connect. Chapter 1 provides an example of this used in Facebook.
Link prediction in an accounting setting might work to use social media to look for relationships
between related parties that are not otherwise disclosed to identify related party transactions.
4. Prescriptive analytics:
• Decision support systems are rule-based systems that gather data and recommend actions based
on the input. Tax preparation software, investment advice tools, and auditing tools recommend
courses of actions based on data that are input as part of an interview or interrogation process.
• Machine learning and artificial intelligence are learning models or intelligent agents that adapt
to new external data to recommend a course of action. For example, an artificial intelligence
model may observe opinions given by an audit partner and adjust the model to reflect chang-
ing levels of risk appetite and regulation.

While these are all important and applicable data approaches, in the rest of the chapter
we limit our discussion to the more common models, including summary statistics, data
reduction, profiling, clustering, regression, classification, and artificial intelligence. You’ll

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 119

find that these data approaches are not mutually exclusive and that actual analysis may
involve parts of several approaches to appropriately address the accounting question.
Just as the different analytics types are important in the third step of the IMPACT
cycle model, “performing the test plan,” they are equally important in the fourth step “A”—
“address and refine results”—as analysts learn from the test approaches and refine them.

PROGRESS CHECK
1. Using Exhibit 3-2, identify the appropriate approach for the following questions:
a. Will a customer purchase item X if given incentive A?
b. Should we offer a customer a line of credit?
c. What quantity of products will the customer purchase?
2. What is the main difference between descriptive and diagnostic methods?

DESCRIPTIVE ANALYTICS LO 3-2


Descriptive analytics help summarize what has happened in the past. For example, a finan- Explain several
cial accountant would sum all of the sales transactions within a period to calculate the descriptive
value for Sales Revenue that appears on the income statement. An analyst would count the analytics
number of records in a data extract to ensure the data are complete before running a more approaches,
complex analysis. An auditor would filter data to limit the scope to transactions that repre- including summary
sent the highest risk. In all these cases, basic analysis provides an understanding of what has statistics and
happened in the past to help decision makers achieve good results and correct poor results. data reduction,
Here we look at two main approaches that are used by accountants today: summary and how they
statistics and data reduction. summarize results.

Summary Statistics
Summary statistics describe the location, spread, shape, and dependence of a set of obser-
vations. These commonly include the count, sum, minimum, maximum, mean or average,
standard deviation, median, quartiles, correlation covariance, and frequency that describe a
specific measurable value, shown in Exhibit 3-3.

Statistic Excel formula Description EXHIBIT 3-3


Description of
Sum =SUM() The total value of all numerical values Summary Statistics
Mean =AVERAGE() The center value; sum of all observations divided by the
number of observations
Median =MEDIAN() The middle value that divides the top half of the data from the
bottom half
Minimum =MIN() The smallest value
Maximum =MAX() The largest value
Count =COUNT() The number of observations
Frequency =FREQUENCY() The number of observations in each of a series of numerical or
categorical buckets
Standard deviation =STDEV() The variability or spread of the data from the mean; a larger
standard deviation means a wider spread away from the mean
Quartile =QUARTILE() The value that divides a quarter of the data from the rest;
indicates skewness of the data
Correlation coefficient =CORREL() How closely two datasets are correlated or predictive of each other

ISTUDY
120 Chapter 3 Performing the Test Plan and Analyzing the Results

The use of summary statistics helps the user understand what the data look like. For
example, the sum function can be used to determine account balances. The mean and
median can be used to aggregate transactions by employee, location, or division. The stan-
dard deviation and frequency help to identify normal behavior and trends in the data.

Lab Connection
Lab 2-4 and Lab 3-4 have you generate multiple summary statistics at once.

Data Reduction
As you recall, the data reduction approach attempts to reduce the amount of detailed infor-
mation considered to focus on the most critical, interesting, or abnormal items (e.g., highest
cost, highest risk, largest impact, etc.). It does this by filtering through a large set of data
(perhaps the total population) and reducing it to a smaller set that has the vast majority
of the critical information of the larger set. The data reduction approach is done primarily
using structured data—that is, data that are stored in a database or spreadsheet and are read-
ily searchable.
Data reduction involves the following steps (using an example of an employee creating a
fictitious vendor and submitting fake invoices):
1. Identify the attribute you would like to reduce or focus on. For example, an employee may
commit fraud by creating a fictitious vendor and submitting fake invoices. Rather than
evaluate every employee, an auditor may be interested only in employee records that
have addresses that match vendor addresses.
2. Filter the results. This could be as simple as using filters in Excel, or using the WHERE
phrase in a SQL query. It may also involve a more complicated calculation. For exam-
ple, employees who create fictitious vendors will often use addresses that are similar
to, but not exactly the same as, their own address to foil basic SQL queries. Here the
auditor should use a tool that allows fuzzy matching, which uses probability to identify
likely similar addresses.
3. Interpret the results. Once you have eliminated irrelevant data, take a moment to see
if the results make sense. Calculate the summary statistics. Have you eliminated any
obvious entries? Looking at the list of matching employees, the auditor might tweak the
probability in the fuzzy match to be more or less precise to narrow or broaden the num-
ber of employees who appear.
4. Follow up on results. At this point, you will continue to build a model or use the results
as a targeted sample for follow-up. The auditor should review company policy and fol-
low up with each employee who appears in the reduced list as it represents risk.

Example of Data Reduction in Internal and External Auditing


While auditing has employed various random and stratified sampling over the years, Data
Analytics suggests new ways to highlight transactions that do not need the same level of vet-
ting or further analysis as other transactions. One example might be to filter the travel and
entertainment (T&E) transactions to find specific values, including whole-dollar amounts
of T&E expenses. Whole-dollar amounts have a greater likelihood of being made up or
fraudulent (as illustrated in Exhibit 3-4).
Auditors may filter data to consider only those transactions being paid to specific ven-
dors, such as mobile payment processors. Because anyone can create a payment account
using processors such as Square Payments, there is a higher potential for the existence of a

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 121

EXHIBIT 3-4
Use Filters to Reduce
Data

fictitious or employee-created vendor. The data reduction approach allows us to focus more
time and effort on those vendors and transactions that might require additional analysis to
make sure they are legitimate.
Another example of the data reduction approach is gap detection, where we look for
missing numbers in a sequence, such as payments made by check. Finding out why certain
check numbers were skipped and not recorded requires additional consideration such as
interviewing business process owners or those that oversee/supervise that process to deter-
mine if there are valid reasons for the missing numbers.
Data reduction may also be used to filter all the transactions between known related
party transactions. Focusing specifically on related party transactions allows the auditor to
focus on those transactions that might potentially be sensitive and/or risky.
Finally, data reduction might be used to compare the addresses of vendors and employ-
ees to ensure that employees are not siphoning funds to themselves. Use of fuzzy match
looks for correspondences between portions, or segments, of the text of each potential
match, shown in Exhibit 3-5. Once potential matches between vendors and employees are
found, additional analysis must be conducted to figure out if funds have been, or potentially
could be, siphoned.

EXHIBIT 3-5
A Fuzzy ­Matching
Shows a Likely Match
of an Employee
and Vendor

Lab Connection
Lab 3-1 has you reduce data by looking for similar values with a fuzzy match.

Examples of Data Reduction in Other Accounting Areas


Data reduction approaches are also used in operational audit settings. For example, filtering
the data to find cases where there are duplicate invoice payments might be an efficient way
to find errors or fraud. Once duplicate invoice payments are found, additional work can be
done to identify the reasons this has occurred. It may also be a way to reduce costs when

ISTUDY
122 Chapter 3 Performing the Test Plan and Analyzing the Results

duplicate payments are found and new internal controls are considered to mitigate dupli-
cate payments from occurring in the future.
Data reduction approaches may also be useful in a financial statement analysis setting,
perhaps performed by financial analysts, pension fund managers, or individual investors.
Among other uses, XBRL (eXtensible Business Reporting Language) is used to facilitate the
exchange of financial reporting information between the company and the Securities and
Exchange Commission (SEC). The SEC then makes it available to all interested parties,
including suppliers, competitors, investors, and financial analysts. XBRL requires that the
data be tagged according to the XBRL taxonomy. Using these tagged data in common-sized
financial statements, analysts develop models to access all relevant financial or nonfinan-
cial data to summarize and predict earnings, solvency, liquidity, and profitability. We’ll
explore XBRL further in Chapter 8.

PROGRESS CHECK
3. Describe how the data reduction approach could be used to evaluate employee
travel and entertainment expenses.
4. Explain how XBRL might be used by lenders to focus on specific areas of interest.

LO 3-3 DIAGNOSTIC ANALYTICS


Explain the Diagnostic analytics provide insight into why things happened or how individual data val-
diagnostic ues relate to the general population. Once you summarize data using descriptive techniques,
approach to Data you can drill down and discover the numbers that are driving an outcome. Benchmarks give
Analytics, including context to the data by giving analysts a reference point (or line) to compare the data to.
profiling and For example, the arithmetic mean of a dataset gives you context for a specific value. These
clustering. benchmarks may be based on past activity or a comparison with a major competitor or an
entire industry.
Two common methods of diagnostic analytics include profiling and cluster analysis. In
both of these cases the analysis provides insight into where a specific value lies relative to
the rest of the sample or population. The farther the distance from the rest of the observa-
tions, the more interesting the individual value becomes. These outliers could represent
risks or opportunities to learn more about the business process driving the behavior. When
performing profiling, it is often helpful to standardize data into Z-scores if you are com-
paring datasets that are set on different scales (for example, dollars and euros, height and
weight). In the next section, before we dive into profiling examples, we will present a brief
discussion on Z-score transformation and working with methods of visualizing data distri-
bution, including a curve and a box plot.
Hypothesis testing is a third method of diagnostic analysis used to determine if there
are significant differences between groups. Running hypothesis tests allows you to system-
atically compare the descriptive statistics of two groups or variables (e.g., averages and
data distributions) to determine if the difference is significantly greater than the expected
variation.
Diagnostic analytics is especially important for internal and external auditors, and their
charge to look for errors, anomalies, outliers, and especially fraud in their financial state-
ments. Preliminary accounting research finds that company investments in diagnostics ana-
lytics software are associated with the following benefits:
• reduced external audit fees,
• reduced audit delay,

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 123

• lower material weaknesses, and


• restatements1

Standardizing Data for Comparison (Z-score)


A standard score or Z-score is a statistical concept that assigns a value to a number based
on how many standard deviations it stands from the mean, shown in Exhibit 3-6. To convert
observations to a Z-score in Excel, take the following steps:
1. Calculate the average: =AVERAGE([range]).
2. Calculate the standard deviation: =STDEVPA([range]).
3. Add a new column called “Z-score” next to your number range.
4. Calculate the Z-score: =STANDARDIZE([value],[mean],[standard deviation]).
A. Alternatively: =([value]–[mean])/[standard deviation].
By setting the mean to 0, each data point’s value now represents the number of devi-
ations from the mean. Large populations are normally distributed and nearly all values
(98 percent) should be within plus-or-minus three standard deviations. Hence, if the data
point Z-score is greater than 3 it is likely an outlier that warrants scrutiny.

Point of EXHIBIT 3-6


The Z table will provide Interest The Z-score Shows the
the percentage of data Relative Position of a
Mean
points that will fall to
Point of Interest to the
the left of our point of
interest (shaded area Population
under the curve) Source: http://www
.dmaictools.com/wp-content/
uploads/2012/02/z-definition
In this case, Z = .jpg
1.2, and the
area to the left
of Z is 0.88, or Z Scale
88% of all
theoretical data −3σ −2σ −1σ 0σ 1σ 2σ 3σ
points

Profiling
Profiling involves gaining an understanding of a typical behavior of an individual, group,
or population (or sample). Profiling is done primarily using structured data—data that are
stored in a database or spreadsheet and are readily searchable. Using these data, analysts
can use common summary statistics to describe the individual, group, or population, includ-
ing knowing its mean, standard deviation, sum, and so on. Profiling is generally performed
on data that are readily available, so the data have already been gathered and are ready for
further analysis.
Profiling is used to discover patterns of behavior. In Exhibit 3-7, for example, the higher
the Z-score (farther away from the mean), the more likely a customer will have a delayed
shipment (blue circle). As shown in Exhibit 3-7, a Z-score of 3 represents three standard
deviations away from the mean. We use profiling to explore the attributes of the products or
vendors that may be experiencing shipping delays.

1
 im, J.-H., J. Park, G. F. Peters, and V. J. Richardson, “Examining the Potential Benefits of Audit Data
L
­Analytics,” University of Arkansas working paper.

ISTUDY
124 Chapter 3 Performing the Test Plan and Analyzing the Results

EXHIBIT 3-7 Z-scores Provide an Example of Profiling That Helps Identify Outliers (in This Case, Categories with Unusually
High Average Days to Ship)

A box plot will provide similar insight. Instead of focusing on the mean and standard
deviation, a box plot highlights the median and quartiles, similar to Exhibit 3-8. Box plots are
used to visually represent how data are dispersed in terms of the interquartile range (IQR).
The IQR is a way to analyze the shape of your dataset that focuses on the median. To find

EXHIBIT 3-8 Box Plots Provide an Example of Profiling That Helps Identify Outliers (in This Case, Categories with
Unusually High Average Days to Ship)

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 125

the interquartile range, the dataset must first be divided into four parts (quartiles), and the
middle two quartiles that surround the median are the IQR. The IQR is considered more
helpful than a simple range measure (maximum observation − minimum observation) as a
measure of dispersion when you are analyzing a sample instead of a population because it
turns the focus toward the most common values and minimizes the risk of the range being
too heavily influenced by outliers.
Creating quartiles and identifying the IQR is done in the following steps (of course,
­Tableau and Excel also have ways to automatically create the quartiles and box plots so that
you do not have to perform these steps manually):
1. Rank order your data first (the same as you would do to find the median or to find
the range).
2. Quartile 1: the lowest 25 percent of observations
3. Quartile 2: the next 25 percent of observations—its cutoff is the median
4. Quartile 3: begins at the median and extends to the third 25 percent of observations
5. Quartile 4: the highest 25 percent of observations
6. The interquartile range includes Quartile 2 and Quartile 3.
Box plots are sometimes called box and whisker plots because they typically are made up
of one box and two “whiskers.” The box represents the IQR and the two whiskers represent
the ends of Quartiles 1 and 4. If your data have extreme outliers, these will be indicated as
dots that extend beyond the “whiskers.”
Data profiling can be as simple as calculating summary statistics on transactional data,
such as the average number of days to ship a product, the typical amount we pay for a prod-
uct, or the number of hours an employee is expected to work. On the other hand, profiling
can be used to develop complex models to predict potential fraud. For example, you might
create a profile for each employee in a company that may include a combination of salary,
hours worked, and travel and entertainment purchasing behavior. Sudden deviations from an
employee’s past behavior may represent risk and warrant follow-up by the internal auditors.
Similar to evaluating behavior, data profiling is typically used to assess data quality and
internal controls. For example, data profiling may identify customers with incomplete or
erroneous master data or mistyped transactions.
Data profiling typically involves the following steps:
1. Identify the objects or activity you want to profile. What data do you want to evaluate?
Sales transactions? Customer data? Credit limits? Imagine a manager wants to track
sales volume for each store in a retail chain. They might evaluate total sales dollars,
asset turnover, use of promotions and discounts, and/or employee incentives.
2. Determine the types of profiling you want to perform. What is your goal? Do you want to
set a benchmark for minimum activity, such as monthly sales? Have you set a budget
that you wish to follow? Are you trying to reduce fraud risk? In the retail store scenario,
the manager would likely want to compare each store to the others to identify which
ones are underperforming or overperforming.
3. Set boundaries or thresholds for the activity. This is a benchmark that may be manually
set, such as a budgeted value, or automatically set, such as a statistical mean, quartile, or
percentile. The retail chain manager may define underperforming stores as those whose
sales activity falls below the 20th percentile of the group and overperforming stores as
those whose sales activity is above the 80th percentile. These thresholds are automati-
cally calculated based on the total activity of the stores, so the benchmark is dynamic.
4. Interpret the results and monitor the activity and/or generate a list of exceptions. Here is
where dashboards come into play. Management can use digital dashboards to quickly
see multiple sets of profiled data and make decisions that would affect behavior. As
you evaluate the results, try to understand what a deviation from the defined boundary

ISTUDY
126 Chapter 3 Performing the Test Plan and Analyzing the Results

represents. Is it a risk? Is it fraud? Is it just something to keep an eye on? To evaluate


the stores, the retail chain manager may review a summary of the sales indicators and
quickly identify under- and overperforming stores. The manager is likely to be more
concerned with underperforming stores because they represent major challenges for
the chain. Overperforming stores may provide insight into marketing efforts or the cus-
tomer base.
5. Follow up on exceptions. Once a deviation has been identified, management should have
a plan to take a course of action to validate, correct, or identify the causes of the abnor-
mal behavior. When the retail chain manager notices a store that is underperforming
compared to its peers, they may follow up with the individual store manager to under-
stand their concerns or offer a local promotion to stimulate sales.
As with most analyses, data profiles should be updated periodically to reflect changes in
firm activity and identify activity that may be more relevant to decision making.

Lab Connection
Lab 3-5 has you profile online and in-person sales and make comparisons of
sales behavior.

Data Analytics at Work

Big Four Invest Billions in Tech, Reshaping Their Identities as


­Professional Services Firm with a Technology Core
Bloomberg reports that three of the four biggest accounting firms are investing $9 billion
in artificial intelligence and Data Analytics products. That’s right, the Big Four account-
ing firms are making transformative investments into technology, rewriting their DNA.
To be sure, they will still be professional services firms, but with a technology core.
While many of the routine jobs will go, such as invoice processing, many more jobs
will emphasize checking a company’s transactions in real time using Data Analytics
to spot trends and anomalies (diagnostic analytics). With all the data at their disposal,
accountants will become business advisers, rather than bean counters.
Learning Data Analytics skills will prepare you for the accounting profession both
today and tomorrow!
Source: Bloomberg, “Big Four Invests Billions in Tech, Reshaping Their Identities,” Bloomberg,
January 2, 2020, https://news.bloomberglaw.com/us-law-week/big-four-invest-billions-in-tech-
reshaping-their-identities (accessed January 29, 2021).

Example of Profiling in Management Accounting


Advanced Environmental Recycling Technologies (ticker symbol AERT) makes wood-­plastic
composite for decking that doesn’t rot and keeps its form, color, and shape indefinitely. It
has developed a recipe and knows the standards of how much wood, plastic, and coloring
goes into each foot of decking. AERT has developed standard costs and constantly calculates
the means and standard deviations of the use of wood, plastic, coloring, and labor for each
foot of decking. As the company profiles each production batch, it knows that when signifi-
cant variances from the standard cost occur, those variances need to be investigated further.

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 127

Management accounting relies heavily on diagnostic analytics in the planning and con-
trolling process. By comparing the actual results of activity to the budgeted expectation,
management determines the processes and procedures that resulted in favorable and unfa-
vorable activity. For example, in a manufacturing company like AERT, variance analysis
compares the actual cost, price, and volume of various activities with standard equivalents,
shown in Exhibit 3-9. The unfavorable variances appear in orange as the actual cost exceeds
the budgeted cost or are to the left of the budget reference line. Favorable variances appear
to the right of the budget reference line in blue. Sales exceed the budgeted sales. As sales
volume increases, the costs (negative values) also increase, leading to an unfavorable vari-
ance in orange.

Example of Profiling in an Internal Audit


Profiling might also be used by internal auditors to evaluate travel and entertainment (T&E)
expenses. In some organizations, total annual T&E expenses are second only to payroll and
so represent a major expense for the organization. By profiling the T&E expenses, we can
understand the average amount and range of expenditures and then compare and contrast
with prior period’s mean and range to help identify changing trends and potential risk areas
for auditing and tax purposes. This will help indicate areas where there are lack of controls,
changes in procedures, individuals more willing to spend excessively in potential types of
T&E expenses, and so on, which might be associated with higher risk.
The use of profiling in internal audits might unearth employees’ misuse of company funds,
like in the case of Tom Coughlin, an executive at Walmart, who misused “company funds

EXHIBIT 3-9 Variance Analysis Is an Example of Data Profiling

ISTUDY
128 Chapter 3 Performing the Test Plan and Analyzing the Results

to pay for CDs, beer, an all-terrain vehicle, a customized dog kennel, even a computer as his
son’s graduation gift—all the while describing the purchases as routine business expenses.”2

Example of Profiling in Auditing


Profiling is also useful in continuous auditing. If we consider the dollar amount of each
transaction, we can develop a Z-score by knowing the mean and standard deviation. Using
our statistics knowledge and assuming a normal distribution, any transaction that has a
Z-score of 3 or above would represent abnormal transactions that might be associated with
higher risk. We can investigate further seeing if those transactions had appropriate approv-
als and authorization.
An analysis of Benford’s law could also be used to assess a set of transactions. Benford’s
law is an observation about the frequency of leading digits in many real-life sets of numeri-
cal data. The law states that in many naturally occurring collections of numbers, the signifi-
cant leading digit is likely to be small. If the distribution of transactions for an account like
“sales revenue” is substantially different than Benford’s law would predict, then we would
investigate the sales revenue account further and see if we can explain why there are dif-
ferences from Benford’s law. Exhibit 3-10 shows an illustration of Benford’s law using the
first digit of employee transactions. An abnormal frequency of transactions beginning with
the number four may indicate that employees are attempting to circumvent internal con-
trols, such as an approval limit. While the number one also exceeds the expected value, we
would expect a larger volume of smaller numbers. We will discuss additional applications of
­Benford’s law in Chapter 6.

EXHIBIT 3-10
Benford’s Law Applied
to Large Numerical
Datasets (including
Employee Transactions)

Cluster Analysis
The clustering data approach works to identify groups of similar data elements and the
underlying relationships of those groups. More specifically, clustering techniques are used
to group data/observations into a specific number of clusters or groups so that all the data
within any cluster are similar, while data across clusters are different. Cluster analysis works
by calculating the minimum distance between each observation and the center of each clus-
ter, shown in Exhibit 3-11.
2
 . Barbaro, “Wal-Mart Official Misused Company Funds,” Washington Post, July 15, 2005,
M
http://www.washingtonpost.com/wp-dyn/content/article/2005/07/14/AR2005071402055.html
(accessed August 2, 2017).

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 129

EXHIBIT 3-11
Clustering Is Used to
Find Three Natural
Groupings of Vendors
Based on Purchase

Volume
Activity

Distance

When you are exploring the data for these patterns and don’t have a specific question,
you would use an unsupervised approach. For example, consider the question: “Do our ven-
dors form natural groups based on similar attributes?” In this case, there isn’t a specific target
because you don’t yet know what similarities our vendors have. You may use clustering to
evaluate the vendor attributes and see which ones are closely related. You could also use
co-occurrence grouping to match vendors by geographic region; data reduction to simplify
vendors into obvious categories, such as wholesale or retail or based on overall volume of
orders; or profiling to evaluate products or vendors with similar on-time delivery behavior,
shown in Exhibit 3-7. In any of these cases, the data drive the analysis, and you evaluate the
output to see if it matches our intuition. These exploratory exercises may help to define bet-
ter questions, but are generally less useful for making decisions.
As an example, Walmart may want to understand the types of customers who shop at its
stores. Because Walmart has good reason to believe there are different market segments of
people, it may consider changing the design of the store or the types of products to accom-
modate the different types of customers, emphasizing the ones that are most profitable to
Walmart. To learn about the different types of customers, managers may ask whether cus-
tomers agree with the following statements using a scale of 1–7 (on a Likert scale):
• Enjoy: I enjoy shopping.
• Budget: I try to avoid shopping because it is bad for the budget.
• Eating: I like to combine my shopping with eating out.
• Coupons: I use coupons when I shop.
• Quality: I care more about the quality of the products than I do about the price.
• Apathy: I don’t care about shopping.
• Comparing: You can save a lot of money by comparing prices between various stores.
Additionally, they would ask about numerical customer behavior:
• Income: The household income of the respondent (in dollars).
• Shopping at Walmart: How many times a month do you visit Walmart?
Accountants may analyze the data and plot the responses to see if there are correlations
within the data on a scatter plot. The visual plot of the relationship between responses to
the various questions may help cluster the various customers into different clusters and help
Walmart cater to specific customer clusters better through superior insights.

Lab Connection
Lab 3-2 has you use cluster analysis to identify natural groupings of loan data.

ISTUDY
130 Chapter 3 Performing the Test Plan and Analyzing the Results

Example of the Clustering Approach in Auditing


The clustering data approach may also be used in an auditing setting. Imagine a group
insurance setting where fraudulent claims associated with payment were previously found
by internal auditors through happenstance and/or through hotline tips. Based on current
internal audit tests, payments are the major concern of the business unit. Specifically,
the types of related risks identified are duplicate payments, fictitious names, improper/
incorrect information entered into the systems, and suspicious payment amounts.
Clustering is useful for anomaly detection in payments to insurance beneficiaries, sup-
pliers, and so on. By identifying transactions with similar characteristics, transactions are
grouped together into clusters. Those clusters that consist of few transactions or small pop-
ulations are then flagged for investigation by the auditors as they represent groups of outli-
ers. Examples of these flagged clusters include transactions with large payment amounts
and/or a long delay in processing the payment.
The dimensions used in clustering may be simple correlations between variables, such
as payment amount and time to pay, or more complex combinations of variables, such as
ratios or weighted equations. As they explore the data, auditors develop attributes that they
think will be relevant through intuition or data exploration. Exhibit 3-12 illustrates cluster-
ing of insurance payments based on the following attributes:
1. Payment amount: The value of the transaction payment.
2. Days to pay: The number of days from the original recorded transaction to the
­payment date.

EXHIBIT 3-12 Cluster Analysis of Insurance Payments

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 131

The data are normalized to reduce the distortion of the data and other outliers are
removed. They are then plotted with the number of days to pay on the y-axis and the pay-
ment amount on the x-axis. Of the eight clusters identified, three clusters highlight potential
anomalies that may require further investigation as part of an internal or external audit.
• Cluster 6 payments (purple) have a long duration between the processing to
payment dates.
• Cluster 7 payments (pink) have high payment amounts.
• Cluster 8 payments (brown) have high payment amounts and a long duration between
the processing date and the payment date.
With this insight, auditors may assess the risk associated with these payments and under-
stand transaction behavior relative to acceptable behavior defined in internal controls.

Hypothesis Testing for Differences in Groups


One way of uncovering causal relationships is to form hypotheses of what you expect will
or will not occur. A common test to identify if a difference in means is significant is to do a
two-sample t-test for equal means. A two-sample t-test for equal means is used to determine
if the difference between the means of two different populations is significant or not.
We can use the profiling example from Exhibits 3-6 and 3-7 to illustrate an opportunity
to run a two-sample t-test. After observing the distributions, a few categories stand out as
seeming to have quicker average shipping time. After digging into the averages, we can find
out that the subcategory of Copiers has the lowest average shipping time. We can drill down
into this observation to discover if Copiers take significantly less time to ship on average
than the other categories. If that is the case (and the difference isn’t just due to chance),
then perhaps there are efficiencies that can be gained in the other subcategories.
To run the hypothesis test, we first need to frame our hypotheses.
Usually hypotheses are paired in twos: the null hypothesis and the alternative hypothe-
sis. The first is the base case, often called the null hypothesis, and assumes the hypothesized
relationship does not exist, that there is no significant difference between two samples or
populations. In this case, the null hypothesis would be stated as follows:
Null Hypothesis: H0: The average shipping time for the Copiers category is not signifi-
cantly different than the average shipping time for all other categories.
The alternative hypothesis would be the case that the analyst believes to be true. That is,
an alternative hypothesis is opposite of the null hypothesis, or a potential result that the
analyst may expect.
Alternative Hypothesis: HA: The average shipping time for the Copies category is signifi-
cantly greater than the average shipping time for all other categories.
For the null hypothesis to hold, we would recognize that even though there is a differ-
ence in the average shipping times, the difference is not significant—that is, there is no evi-
dence that the differences in average shipping times are not due simply to chance. Evidence
for the alternate hypothesis occurs when the null hypothesis does not hold and is rejected at
some level of statistical significance. In other words, before we can reject or fail to reject the
null hypothesis, we need to do a statistical test of the data with shipping times for Copiers
and shipping times for all other categories, and then interpret the results of that statistical
test to make a conclusion.

Statistical Significance
When we are working with a sample of data instead of the entire population, testing our
hypotheses is more complicated than simply comparing two means. If we discover that
the average shipping time for Copiers is higher than the average shipping time for all other

ISTUDY
132 Chapter 3 Performing the Test Plan and Analyzing the Results

categories, simply seeing that there is a difference between the two means is not enough—
we need to determine if that difference is big enough to not have been due to chance, or
put in statistical words, we need to determine if the difference is significant. When we are
making decisions about the entire population based on only a subset of data, it is possible
that even with the best sampling methods, we might have collected a sample that is not
perfectly representative of the population as a whole. Because of that possibility, we have to
take into account some room for error. When we work with data, we are not in control of
many measures that factor into the statistical tests that we run—we can’t change the mean
or the standard deviation, for example. And depending on how much control we have over
retrieving our own data, we may not even have much control over how large of a sample we
collect. Of course, if we have control, then the larger the sample, the better. Larger samples
have a better chance of being more representative of the population as a whole. We do have
control over one thing, though, and that is our significance level.
Keep in mind that when we run our hypothesis tests, we will come to the conclusion to
either “reject the null hypothesis” or “fail to reject the null hypothesis.” There is always a
risk when making inferences about any population based on a sample that the data don’t
accurately reflect the population, and that would result in us making a wrong decision. That
wrong decision isn’t the same thing as making a mistake in a calculation or getting a ques-
tion wrong on an exam—this type of error is one we won’t even know that we made. That’s
the risk associated with making inferences about a population based on a sample. What we
do to protect ourselves (other than doing everything we can to collect a large and represen-
tative sample) is determine which wrong result presents us with the least risk. Would we
prefer to erroneously assume that Copier shipping times are significantly low, when actu-
ally the shipping times for other categories are lower? Or would it be safer to erroneously
assume Copier shipping times are not significantly low, when actually they are? There is no
cut and dry answer to this. It always depends on the scenario.
The significance level, also referred to as alpha, reflects which erroneous decision we are
more comfortable with. The lower alpha is, the more difficult it is to reject the null hypoth-
esis, which minimizes the risk of erroneously rejecting the null hypothesis. Common alphas
are 1 percent, 5 percent, and 10 percent. In the next section, we will take a closer look at
what alpha means and how we use it to determine whether we will reject or fail to reject the
null hypothesis.

The p-value
We describe findings as statistically significant by interpreting the p-value of the statisti-
cal test. The p-value is compared to the alpha threshold. A result is statistically significant
when the p-value is less than alpha, which signifies a difference between the two means was
detected: that the default hypothesis can be rejected. The p-value is the result of a calcula-
tion that involves summary measures from your sample. It is completely dependent upon
the sample you are analyzing, and nothing else. Alpha is the only measure in your control.
If p-value > alpha: Fail to reject the null hypothesis (data do not present a significant
result).
If p-value < = alpha: Reject the null hypothesis (data present a significant result).
Now let’s consider risk again. Consider the following screenshot of a t-test, the p-value is
highlighted in yellow (Exhibit 3-13):
If our alpha is 5 percent (0.05), we would have to fail to reject the null because the p-value
of 0.073104 is greater than alpha. But what if we set our alpha at 10 percent instead of
5 percent? All of a sudden, our p-value of 0.073104 is less than alpha (0.10). It is ­critical to
make the decision of what your significance level will be (1 percent, 5 percent, 10 ­percent)
prior to running your statistical test. The p-value shouldn’t dictate which alpha you select,
as tempting as that may be!

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 133

EXHIBIT 3-13
t-test Assessing for Sig-
nificant Differences in
Average Shipping Times
across Categories

PROGRESS CHECK
5. How does Benford’s law provide an expectation of any set of naturally occurring
collections of numbers?
6. Identify a reason the sales amount of any single product may or may not follow
Benford’s law.
7. Name three clusters of customers who might shop at Walmart.
8. In Exhibit 3-12, Cluster 1 of the group insurance highlighted claims have a long
period from death to payment dates. Why would that cluster be of interest to
internal auditors?

PREDICTIVE ANALYTICS LO 3-4


Before we discuss predictive analytics, we need to bring you up to speed on some data- Understand
specific terms: the techniques
• A target is an expected attribute or value that we want to evaluate. For example, if we associated
with predictive
are trying to predict whether a transaction is fraudulent, the target might be a specific
analytics, including
“fraud score.” If we’re trying to predict an interest rate, the target would be “interest regression and
rate.” The target is usually referred to as the dependent variable in a regression analysis. classification.
• A class is a manually assigned category applied to a record based on an event. For
example, if the credit department has rejected a credit line for a customer, the credit
department assigns the class “Rejected” to the customer’s master record. Likewise, if
the internal auditors have confirmed that fraud has occurred, they would assign the
class “Fraud” to that transaction.
On the other hand, we may ask questions with specific outcomes, such as: “Will a new
vendor ship a large order on time?” When you are performing an analysis that uses historical
data to predict a future outcome, you will use a supervised approach. You might use ­regression
to predict a specific value to answer a question such as, “How many days do we predict it
will take a new vendor to ship an order?” Again, the prediction is based on the activity we
have observed from other vendors, shown in Exhibit 3-14. We use historical data to create
the new model. Using a classification model, you can predict whether a new vendor belongs
to one class or another based on the behavior of the others, shown in Exhibit 3-15. Causal
­modeling, similarity matching, and link prediction are additional supervised approaches

ISTUDY
134 Chapter 3 Performing the Test Plan and Analyzing the Results

EXHIBIT 3-14
Regression

Days to ship
Volume

EXHIBIT 3-15
Classification
Volume

Distance

where you attempt to identify causation (which can be expensive), identify a series of char-
acteristics that predict a model, or attempt to identify other relationships, respectively.
Predictive analytics facilitate making forecasts of accounting outcomes, including these
examples:
1. Helping management accountants predict future performance, including future sales,
earnings, and cash flows. This will help management accountants set budgets and plan
production, and estimate available funds for loan repayments, dividend payments, and
operations.
2. Helping company accountants predict which customers will be able to pay what they
owe the company. This will help accountants estimate the appropriate allowance for
doubtful accounts.
3. Helping auditors predict which financial statements need to be restated.
4. Helping investors and lenders predict which companies are likely to go bankrupt, or
unable to continue as a going concern.
5. Helping investors and financial analysts predict future sales, earnings, and cash flows,
critical to stock valuation.

Regression
Regressions allow the accountant to develop models to predict expected outcomes. These
expected outcomes might be to predict the number of days to ship products relative to the
volume of orders placed by the customer, shown in Exhibit 3-14.

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 135

Regression is a supervised method used to predict specific values. In this case, the num-
ber of days to ship is dependent on the number of items in the order. Therefore, we can use
regression to predict the number of days it takes Vendor A to ship based on the volume in
the order. (Vendor A is represented by the gold star in Exhibits 3-14 and 3-15.)
Regression analysis involves the following process:
1. Identify the variables that might predict an outcome (or target or dependent variable).
The inputs, or explanatory variables, are called independent variables, where the
output is a dependent variable. You will probably remember the formula for a lin-
ear equation from algebra or other math classes, y = mx + b. When we run a linear
regression with only one explanatory variable, we use the same equation, with m
representing the slope of the explanatory variable, and b representing the y-intercept.
Because linear regression models can have more than one explanatory variable,
though, the formula is written just slightly different as y = b0 + b1x for a simple (one
explanatory variable) regression, with b0 representing the y-intercept and b1 repre-
senting the slope of the explanatory variable. In a multiple regression model, each
explanatory variable receives their own slope: y = b0 + b1x1 + b2x2 . . . and on until
all of the explanatory variables and their respective slopes have been accounted for.
When you run a regression in Excel, Tableau, or other statistical software, the values
for the intercept and the slopes of each explanatory variable will be provided in a
regression output. To create a predictive model, you simply plug in the values pro-
vided in the regression output along with the values for the particular scenario you
are estimating or predicting.
A. Dummy variables: Variables in linear regression must be numerical, but sometimes
we need to include categorical variables, for instance whether consumers are male
or female, or if they are from Arkansas or New York. When that is the case, we have
to transform our categorical variables into numbers (we can’t add a word to our
formula!), and in particular into binary numbers called dummy variables. Dummy
variables can take on the values of 0 or 1, when 0 represents the absence of some-
thing and 1 represents the presence of something. You will see examples of dummy
variables in Comprehensive Lab 3-6.
2. Determine the functional form of the relationship. Is it a linear relationship where each
input plots to another, or is the relationship nonlinear? While most accounting ques-
tions utilize a linear relationship, it is possible to consider a nonlinear relationship.
3. Identify the parameters of the model. What are the relative weights of each indepen-
dent variable on the dependent variable? These are the coefficients on each of the
independent variables. Statistical t-tests assess each regression coefficient at a time
to determine if the weight is statistically different than 0 (or no weight at all). Par-
ticularly in multiple regression, it can be useful to assess the p-value for each variable.
You interpret the p-value for each variable the same way you assess the p-value in a
t-test: If the p-value is less than your alpha (typically 0.05), then you reject the null
hypothesis. In regression, that implies that the explanatory variable is statistically
significant.
4. Evaluate the goodness of fit. Calculate the adjusted R2 value to determine whether
the data are close to the line or not. In general, the better the fit (e.g., R2 > 0.8),
the more accurate the prediction will be. The adjusted R2 is a value between 0 and
1. An adjusted R2 value of 0 represents no ability to explain the dependent variable,
and an adjusted R2 value of 1 represents perfect ability to explain the dependent
­variable. Another statistic is the Model F-test. The F-test of the overall significance of
the hypothesized model (that has one or more independent variables) compared to a
model with no independent variables tells us statistically if our model is better
than chance.

ISTUDY
136 Chapter 3 Performing the Test Plan and Analyzing the Results

Lab Connection
Lab 3-3 and Lab 3-6 have you calculate linear regression to predict comple-
tion rates and sales types.

Example of the Regression Approach in Cost Accounting


The following discussion primarily identifies the structure of the model—that is, the relation-
ship between the dependent variable and the plausible independent variables—that might be
useful in cost accounting:
Dependent variable = f (Independent variables)
Let’s imagine a regression analysis is run to find the appropriate cost drivers for a company
that makes deliveries using these proposed dependent and independent variables:
Dependent variable: Total daily overhead costs for a company.
Independent variables: Potential overhead cost drivers include:
Deliveries: The number of deliveries made that day.
Miles: The number of miles to make all deliveries that day.
Time (minutes): The delivery time it takes to make all deliveries that day.
Weight: Combined weight of all deliveries made that day.
In the following sections, we provide additional examples of the implementation of
regression analysis in the labs.

Example of the Regression Approach in Managerial Accounting


Accounting firms experience a great amount of employee turnover each year (between 15
and 25 percent each year).3 Understanding and predicting employee turnover is a particu-
larly important determination for accounting firms. Each year, they must predict how many
new employees might be needed to accommodate growth, to supply needed areas of exper-
tise, and to replace employees who have left. Accounting firms might predict employee
turnover by predicting the following regression model in this way:
Employee turnover = f (Current professional salaries, Health of the economy [GDP],
Salaries offered by other accounting firms or by corporate accounting, etc.)
Using such a model, accounting firms could then begin to collect the necessary data to
test their model and predict the level of employee turnover.

Example of the Regression Approach in Auditing


One of the key tasks of auditors of a bank is to consider the amount of the allowance for
loan losses or for nonbanks to consider the allowance for doubtful accounts (i.e., those
receivables that may never be collected). These allowances are often subject to manipula-
tion to help manage earnings.4 The Financial Accounting Standards Board (FASB) issued

3
 ttp://www.cpafma.org/articles/inside-public-accounting-releases-2015-national-benchmarking-report/
h
(accessed November 9, 2016).
4
A. S. Ahmed, C. Takeda, and S. Thomas, “Bank Loan Loss Provisions: A Reexamination of Capital
­Management, Earnings Management and Signaling Effects,” Journal of Accounting and Economics 28,
no. 1 (1999), pp. 1–25.

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 137

Accounting Standards Update 2016-13, which requires that banks provide an estimate of
expected credit losses (ECLs) by considering historical collection rates, current informa-
tion, and reasonable and supportable forecasts, including estimates of prepayments.5 Using
these historical and industry data, auditors may work to test a model to establish a loan loss
reserve in this way:
Allowance for loan loses amount = f (Current aged loans, Loan type,
Customer loan history, Collections success)

Example of the Regression Approach in Accounting or Business


For example, in Chapter 1, we worked to understand why LendingClub rejected certain loan
applications. As we considered all of the possible explanations, we found that there were at
least three possible indicators that a loan might be rejected, including the debt-to-income
ratios, length of employment, and credit (risk) scores, suggesting a model where:
Loan rejection = f (Debt-to-income ratio, Length of employment, Credit [risk] score)
Another example of the regression approach might be the approval of individual credit
card transactions. Assume you go on a trip; in the morning you are in Pittsburgh and by
the very next day, you are in Shanghai. Will your credit card transaction in Shanghai auto-
matically be rejected? Credit card companies establish models to predict fraud and decide
whether to accept or reject a proposed credit card transaction. A potential model may be
the following:
Transaction approval = f (Location of current transaction, Location of last transaction,
Amount of current transaction, Prior history of travel of credit card holder, etc.)

Time Series Analysis


Time series analysis is a predictive analytics technique used to predict future values based
on past values of the same variable. This is a popular technique for estimating future sales,
earnings, and cash flows based on past values of the same variable. Investors will want to
predict future earnings and cash flows for use in valuing the stock. Management will work
to predict future sales for use in planning future production. Lenders will work to predict
future cash flows to evaluate if their loan will be repaid.
While there are many variants of time series analysis, all use prior performance to esti-
mate future performance. For example:
Performancet+1 = f (Performancet-n, . . . , Performancet)
where performance will usually be sales, earnings, or cash flows when used in an accounting
context.

Classification
The goal of classification is to predict whether an individual we know very little about will
belong to one class or another. For example, will a customer have their balance written off?
The key here is that we are predicting whether the write-off will occur or not (in other words,
there are two classes: “Write-Off” and “Good”). This is in contrast with a regression that
attempts to predict many possible values of the dependent variable, rather than just a few
classes as used in classification.

5
 ttp://www.pwc.com/us/en/cfodirect/publications/in-brief/fasb-new-impairment-guidance-financial-­
h
instruments.html (accessed November 9, 2016).

ISTUDY
138 Chapter 3 Performing the Test Plan and Analyzing the Results

Classification is a supervised method that can be used to predict the class of a new
observation. In this case, blue circles represent “on-time” vendors. Green squares represent
“delayed” vendors. The gold star represents a new vendor with no history.
Classification is a little more involved as we are now dealing with machine learning and
complex probabilistic models. Here are the general steps:
1. Identify the classes you wish to predict.
2. Manually classify an existing set of records.
3. Select a set of classification models.
4. Divide your data into training and testing sets.
5. Generate your model.
6. Interpret the results and select the “best” model.

Classification Terminology
First, a bit of terminology to prepare us for our discussion.
Training data are existing data that have been manually evaluated and assigned a class.
We know that some customer accounts have been written off, so those accounts are assigned
the class “Write-Off.” We will train our model to learn what it is that those customers have
in common so we can predict whether a new customer will default or not.
Test data are existing data used to evaluate the model. The classification algorithm will
try to predict the class of the test data and then compare its prediction to the previously
assigned class. This comparison is used to evaluate the accuracy of the model or the prob-
ability that the model will assign the correct class.
Decision trees are used to divide data into smaller groups, and decision boundaries mark
the split between one class and another.
Exhibit 3-16 provides an illustration of both decision trees and decision boundaries.
­Decision trees split the data at each branch into two or more groups. In this example, the first
branch divides the vendor data by geographic distance and inserts a decision boundary through
the middle of the data. Branches 2 and 3 split each of the two new groups by vendor volume.
Note that the decision boundaries in the graph on the right are different for each grouping.
Pruning removes branches from a decision tree to avoid overfitting the model. In other
words, pruning reduces the number of times we split the groups of data into smaller groups,
as shown in Exhibit 3-16. Pre-pruning occurs during the model generation. The model stops
creating new branches when the information usefulness of an additional branch is low.
Post-pruning evaluates the complete model and discards branches after the fact. Exhibit 3-17
provides an illustration of how pruning might work in a decision tree.
Linear classifiers are useful for ranking items rather than simply predicting class prob-
ability. These classifiers are used to identify a decision boundary. Exhibit 3-18 shows an
illustration of linear classifiers segregating the two classes.

EXHIBIT 3-16 1
Example of Decision
Trees and Decision 1
Boundaries
2
Volume

2 3
3

Distance

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 139

1 EXHIBIT 3-17
Illustration of Pruning
a Decision Tree
2 3

3 EXHIBIT 3-18
Illustration of Linear
2 1
Classifiers
Volume

Error
2
1 3

Distance

A linear discriminant uses an algebraic line to separate the two classes. In the example
noted here, the classification is a function of both volume and distance:
Class(x) = if 1.0 × Volume – 1.5 × Distance + 50 > 0
Class(x) = if 1.0 × Volume – 1.5 × Distance + 50 ≤ 0
We don’t expect linear classifiers to perfectly segregate classes. For example, the green
square that appears below the line in Exhibit 3-18 would be incorrectly classified as a circle
and considered an error.
Support vector machine is a discriminating classifier that is defined by a separating hyper-
plane that works first to find the widest margin (or biggest pipe) and then works to find the
middle line. Exhibits 3-19 and 3-20 provide an illustration of support vector machines and
how they work to find the best decision boundary.

EXHIBIT 3-19
Support Vector
Machines
With support vector
Volume

Volume

machines, first find the


widest margin (biggest
pipe); then find the
middle line.

Distance Distance

ISTUDY
140 Chapter 3 Performing the Test Plan and Analyzing the Results

EXHIBIT 3-20
Support Vector
Machine Decision
Boundaries Error
OK

Volume
SVMs have two
decision ­boundaries at Error
the edges of the pipes. OK Decision boundary

Decision boundary

Distance

Evaluating Classifiers
When classifiers wrongly classify an observation, they are penalized. The larger the penalty
(error), the less accurate the model is at predicting a future value, or classification.

Overfitting
Rarely will datasets be so clean that you have a clear decision boundary. You should always
be wary of classifiers that are too accurate. Exhibit 3-21 provides an illustration of over-
fitting and underfitting. You want a good amount of accuracy without being too perfect.
Notice how the error rate declines from 6 to 3 to 0. You want to be able to generalize your
results, and complete accuracy creates a complex model with little predictive value. For-
mally defined, overfitting is a modeling error when the derived model too closely fits a lim-
ited set of data points. In contrast, underfitting refers to a modeling error when the derived
model poorly fits a limited set of data points.
Exhibit 3-22 provides a good illustration of the trade-offs between the complexity of the
model and the accuracy of the classification. While you may be able to come up with a very
complex model with the training data, chances are it will not improve the accuracy of correctly
classifying the test data. There is, in some sense, a sweet spot, where the model is most accurate
without being so complex to thus allow classification of both the training as well as the test data.

EXHIBIT 3-21
Illustration of Underfit-
ting and Overfitting the
Data with a Predictive
Model

Underfitting Good Overfitting

EXHIBIT 3-22 Training data


Illustration of the 1.0
Trade-Off between .9
the Complexity of
Accuracy

.8 Sweet spot
the Model and the
Accuracy of the .7
Classification
.6
.5 Testing data

Complexity of model

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 141

p-values versus Effect Size


While the p-value has its statistical merits, when samples are large it is also important to
consider effect size, which is the magnitude of the difference between groups.6 Sometimes
when running statistical analysis you will find relatively small differences in means (in a
t-test) or impacts on predictive models (in a regression model) yield extraordinarily small
p-values. This is normal in large samples because nearly any difference can be deemed sig-
nificant if the sample is large enough. Does this mean we shouldn’t use large samples? Of
course not, but it does mean we should take into account the true effect of the difference in
means. It’s important to take a step back from the analyses you run to consider the meaning-
ful (not just statistical) impact of the differences. When running a t-test, Cohen’s D can help
determine the effect size. The formula is a bit complicated, so fortunately we can rely on
an online calculator: https://www.socscistatistics.com/effectsize/default3.aspx. In a regres-
sion analysis, we can rely on the correlation coefficient (the adjusted R2 value you learned
about) to help understand not just the statistical significance of our individual variables, but
also whether the impact of the variables is truly meaningful in making predictions.

PROGRESS CHECK
9. If we are trying to predict the extent of employee turnover, do you believe the
health of the economy, as measured using GDP, will be positively or negatively
associated with employee turnover?
10. If we are trying to predict whether a loan will be rejected, would you expect
credit score to be positively or negatively associated with loan rejection by a
bank such as LendingClub?

PRESCRIPTIVE ANALYTICS LO 3-5


Prescriptive analytics answer the question “What do we do next?” We have collected the Describe the use
data; analyzed and profiled the data; and in some cases, developed predictive models to of prescriptive
estimate the proper class or target value. Once those analyses have been performed, the analytics,
decision process can be aided by rules-based decision support systems, machine learning including decision
models, or added to an existing artificial intelligence model to improve future predictions. support systems,
These analytics are the most complex and expensive because they rely on multiple vari- machine learning,
ables and inputs, structured and unstructured data, and in some cases the ability to under- and artificial
stand and interpret natural language command into data-driven queries. intelligence.

Decision Support Systems


Decision support systems are information systems that support decision-making activity
within a business by combining data and expertise to solve problems and perform calcula-
tions. They are designed to be interactive and adapt to the information collected by the user.
In the accounting domain, they are typically built around a series of rules or If . . . then . . .
branching statements that guide the user through the process to the result.
One of the best examples of decision support systems is the calculation of income tax
using off-the-shelf tax software. Tools like TurboTax guide a nontechnical user through a
series of interview questions and have them enter a numerical income value or answer a

6
 . M. Sullivan and R. Feinn, “Using Effect Size—or Why the P Value Is Not Enough,” Journal of Graduate
G
Medical Education 4, no. 3 (2012), pp. 279–82, https://doi.org/10.4300/JGME-D-12-00156.1.

ISTUDY
142 Chapter 3 Performing the Test Plan and Analyzing the Results

yes/no question. The answers to those questions determine what calculations to include,
which schedules to complete, and what the value of the tax return will be.
Decision support systems can help with application of accounting rules as well. For
example, when a company classifies a lease as a financing or operating lease, it must con-
sider whether the lease meets a number of criteria. Using a decision support system, a
controller could evaluate a new lease and answer five questions to determine the proper
classification, shown in Exhibit 3-23.
Under a previous version of the FASB lease standard, there would have been bright
lines to indicate hard rules to determine the lease (for example, “The lease term is greater
than or equal to 75 percent of the estimated economic life of the leased asset.”). Decision
support systems are easier to use when you have clear rules. Under the newer standard,
more judgment is needed to reach the most appropriate conclusion for the business. More
on this later.
Auditors use decision support systems as part of their audit procedures. For example,
they indicate a series of parameters such as tolerable and expected error rates. A tool like
IDEA will calculate the appropriate sample size for evaluating source documents. Once the
procedure has been performed, that is, source documents are evaluated, the auditor will
then input the number or extent of exceptional items and the decision support system might
classify the audit risk as low, medium, or high for that area.

Machine Learning and Artificial Intelligence


We have discussed some machine learning techniques, including classification and cluster
analysis in the previous sections. What these all have in common is the use of algorithms
and statistical models to generate a previously unknown model that relies on patterns and
inferences. Both unsupervised exploratory analysis and supervised model generation pro-
vide insight and predictive foresight into the business and decisions made by the accoun-
tants and auditors. They can also model judgment and decision making to recommend a
class or action based on new, unknown data.

EXHIBIT 3-23
Lease Classification
Flowchart

ISTUDY
Chapter 3 Performing the Test Plan and Analyzing the Results 143

Take lease classification, for instance. With the recent accounting standard, the language
has moved from bright lines (“75 percent of the useful life”) to judgment (“major part”).
While it may be tempting to rely on the accountant to manually make this decision for each
new lease, machine learning will do it more quickly and more accurately than the manual
classification. A company with a sufficiently large portfolio of previously classified leases
may use those leases as a training set for a machine learning model. Using the data attri-
butes from these leases (e.g., useful life, total payments, fair value, originating firm) and the
prior manual classification (e.g., financing, operating) of the company’s leases, the model
can evaluate a new lease and assign the appropriate classification. Post-classification verifi-
cation and correction in the case of an inappropriate outcome is then fed into the model to
improve the performance of the model.
Artificial intelligence models work similarly in that they learn from the inputs and cor-
rections to improve decision making. For example, image classification allows auditors to
take aerial photography of inventory or fixed assets and automatically identify the objects
within the photo rather than having an auditor manually check each object. Classification
of closed-circuit footage enables automatic counting of foot traffic in a retail location for
managers. Modeling of past judgment decisions by audit partners makes it possible to deter-
mine whether an allowance or estimate falls within a normal range for a client and is accept-
able or should be qualified. Artificial intelligence models track sentiment in social media
and popular press posts to predict positive stock market returns for analysts.
For most application of artificial intelligence models, the computational power is such
that most companies will outsource the underlying system to companies like Microsoft,
Amazon, or Google rather than develop it themselves. These companies provide the datasets
to train and build the model, and the platforms provide the algorithms and code. When
public accounting firms outsource data, clients may be hesitant to allow their financial data
to be used in these platforms without additional assurance surrounding the privacy and
security of their data.

PROGRESS CHECK
11. How might you expect managers to use decision support systems when evaluat-
ing employee bonuses?
12. How do machine learning and artificial intelligence models improve their recom-
mendations over time?

Summary
■ In this chapter, we addressed the third and fourth steps of the IMPACT cycle model: the
“P” for “performing test plan” and “A” for “address and refine results.” That is, how are
we going to test or analyze the data to address a problem we are facing? (LO 3-1)
■ We identified descriptive analytics that help describe what happened with the data,
including summary statistics, data reduction, and filtering. (LO 3-2)
■ We provided examples of diagnostic analytics that help users identify relationships in
the data that uncover why certain events happen through profiling, clustering, similarity
matching, and co-occurrence grouping. (LO 3-3)

ISTUDY
■ We introduced some specific models and terminology related to these tools, including
Benford’s law, test and training data, decision trees and boundaries, linear classifiers,
and support vector machines. We identified cases where creating models that overfit
existing data are not very accurate at predicting the future. (LO 3-4)
■ We explained examples of predictive analytics and introduced some data mining con-
cepts related to regression, classification, and link prediction that can help predict future
events or values. (LO 3-4)
■ We discussed prescriptive analytics, including decision support systems and artificial
intelligence and provided some examples of how these systems can make recommenda-
tions for future actions. (LO 3-5)

Key Words
alternative hypothesis (131) The opposite of the null hypothesis, or a potential result that the
­analyst may expect.
Benford’s law (128) The principle that in any large, randomly produced set of natural numbers, there
is an expected distribution of the first, or leading, digit with 1 being the most common, 2 the next most,
and down successively to the number 9.
causal modeling (133) A data approach similar to regression, but used to test for cause-and-effect
relationships between multiple variables.
classification (133) A data approach that attempts to assign each unit in a population into a few cat-
egories potentially to help with predictions.
clustering (128) A data approach that attempts to divide individuals (like customers) into groups (or
clusters) in a useful or meaningful way.
co-occurrence grouping (129) A data approach that attempts to discover associations between indi-
viduals based on transactions involving them.
data reduction (120) A data approach that attempts to reduce the amount of information that needs
to be considered to focus on the most critical items (e.g., highest cost, highest risk, largest impact, etc.).
decision boundaries (138) Technique used to mark the split between one class and another.
decision support system (141) An information system that supports decision-making activity within
a business by combining data and expertise to solve problems and perform calculations.
decision tree (138) Tool used to divide data into smaller groups.
descriptive analytics (116) Procedures that summarize existing data to determine what has happened
in the past. Some examples include summary statistics (e.g. Count, Min, Max, Average, Median), distribu-
tions, and proportions.
diagnostic analytics (116) Procedures that explore the current data to determine why something has
happened the way it has, typically comparing the data to a benchmark. As an example, these allow users to
drill down in the data and see how it compares to a budget, a competitor, or trend.
digital dashboard (125) An interactive report showing the most important metrics to help users
understand how a company or an organization is performing. Often created using Excel or Tableau.
dummy variables (135) A numerical value (0 or 1) to represent categorical data in statistical analysis;
values assigned a 1 indicate the presence of something and 0 represents the absence.
effect size (141) Used in addition to statistical significance in statistical testing; effect size demon-
strates the magnitude of the difference between groups.
interquartile range (IQR) (124) A measure of variability. To calculate the IQR, the data are first
divided into four parts (quartiles) and the middle two quartiles that surround the median are the IQR.
link prediction (133) A data approach that attempts to predict a relationship between two data items.

144

ISTUDY
null hypothesis (131) An assumption that the hypothesized relationship does not exist, or that there
is no significant difference between two samples or populations.
overfitting (140) A modeling error when the derived model too closely fits a limited set of data points.
predictive analytics (116) Procedures used to generate a model that can be used to determine what is
likely to happen in the future. Examples include regression analysis, forecasting, classification, and other
predictive modeling.
prescriptive analytics (117) Procedures that work to identify the best possible options given con-
straints or changing conditions. These typically include developing more advanced machine learning and
artificial intelligence models to recommend a course of action, or optimizing, based on constraints and/or
changing conditions.
profiling (123) A data approach that attempts to characterize the “typical” behavior of an individual,
group, or population by generating summary statistics about the data (including mean, standard devia-
tions, etc.).
regression (133) A data approach that attempts to estimate or predict, for each unit, the numerical
value of some variable using some type of statistical model.
similarity matching (133) A data approach that attempts to identify similar individuals based on data
known about them.
structured data (123) Data that are organized and reside in a fixed field with a record or a file. Such data
are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms.
summary statistics (119) Describe the location, spread, shape, and dependence of a set of observa-
tions. These commonly include the count, sum, minimum, maximum, mean or average, standard deviation,
median, quartiles, correlation covariance, and frequency that describe a specific measurable value.
supervised approach/method (133) Approach used to learn more about the basic relationships
between independent and dependent variables that are hypothesized to exist.
support vector machine (139) A discriminating classifier that is defined by a separating hyperplane
that works first to find the widest margin (or biggest pipe).
test data (138) A set of data used to assess the degree and strength of a predicted relationship estab-
lished by the analysis of training data.
time series analysis (137) A predictive analytics technique used to predict future values based on
past values of the same variable.
training data (138) Existing data that have been manually evaluated and assigned a class, which
assists in classifying the test data.
underfitting (140) A modeling error when the derived model poorly fits a limited set of data points.
unsupervised approach/method (129) Approach used for data exploration looking for potential
patterns of interest.
XBRL (eXtensible Business Reporting Language) (122) A global standard for exchanging finan-
cial reporting information that uses XML.

ANSWERS TO PROGRESS CHECKS


1. a. Link prediction
b. Classification
c. Regression
2. While descriptive analytics focus on what happened, diagnostic analytics focus on why it
happened. Descriptive and diagnostic analytics are typically paired because you would want
to describe the past data and then compare them to a benchmark to determine why the
results are the way they are, similar to the accounting concepts of planning and controlling.

ISTUDY
3. Data reduction may be used to filter out ordinary travel and entertainment expenses so an
auditor can focus on those that are potentially erroneous or fraudulent.
4. The XBRL tagging allows an analyst or decision maker to focus on one or a category of
expenses of most interest to a lender. For example, lenders might be most interested in
monitoring the amount of long-term debt, interest payments, and dividends paid to assess
if the borrower will be able to repay the loan. Using the capabilities of XBRL, lenders could
focus on just those individual accounts for further analysis.
5. For many real-life sets of numerical data, Benford’s law provides an expectation for the
leading digit of numbers in the dataset. Diagnostic analytics use Benford’s law as the
expectation to highlight differences a dataset might have, and potentially serve as an indi-
cator of fraud or errors.
6. A dollar store might sell everything for exactly $1.00. In that case, the use of Benford’s law
for any single product or even for every product would not follow Benford’s law!
7. Three clusters of customers who might consider Walmart could include thrifty shoppers
(looking for the lowest price), shoppers looking to shop for all of their household needs
(both grocery and non-grocery items) in one place, and those customers who live close to
the store (good location).
8. The longer time between the death and payment dates begs one to ask why it has taken
so long for payment to occur and if the interest required to be paid is likely large. Because
of these issues, there might be a possibility that the claim is fraudulent or at least deserves
a more thorough review to explain why there was such a long delay.
9. We certainly could let the data speak and address this question directly. In general, when
the health of the economy is stronger, there are fewer layoffs and fewer people out look-
ing for a job, which means less turnover. Additional analysis could determine whether the
turnover is voluntary or involuntary.
10. Chapter 1 illustrated that LendingClub collects the credit score data, and the initial analy-
sis there suggested the higher the credit score, the less likely to be rejected. Given this
evidence, we would predict a negative relationship between credit score and loans that
are rejected.
11. Decision support systems follow rules to determine the appropriate amount of a bonus.
­Following a set of rules, the system may evaluate management goals, such as a sales target or
number of new accounts, to calculate and recommend the appropriate bonus compensation.
12. Machine learning and artificial intelligence models learn by incorporating new data and
through manual correction of data. For example, when a misclassified lease is corrected,
the accuracy of the recommended classification of future leases improves.

Multiple Choice Questions


®

1. (LO 3-4) is a set of data used to assess the degree and strength of a predicted
relationship.
a. Training data
b. Unstructured data
c. Structured data
d. Test data
2. (LO 3-4) These data are organized and reside in a fixed field with a record or a file. Such
data are generally contained in a relational database or spreadsheet and are readily
searchable by search algorithms. The term matching this definition is:
a. training data.
b. unstructured data.
c. structured data.
d. test data.
146

ISTUDY
3. (LO 3-3) An observation about the frequency of leading digits in many real-life sets of
numerical data is called:
a. leading digits hypothesis.
b. Moore’s law.
c. Benford’s law.
d. clustering.
4. (LO 3-1) Which approach to Data Analytics attempts to predict a relationship between
two data items?
a. Similarity matching
b. Classification
c. Link prediction
d. Co-occurrence grouping
5. (LO 3-4) In general, the more complex the model, the greater the chance of:
a. overfitting the data.
b. underfitting the data.
c. pruning the data.
d. a more accurate prediction of the data.
6. (LO 3-4) In general, the simpler the model, the greater the chance of:
a. overfitting the data.
b. underfitting the data.
c. pruning the data.
d. the need to reduce the amount of data considered.
7. (LO 3-4) is a discriminating classifier that is defined by a separating hyperplane that
works first to find the widest margin (or biggest pipe) and then works to find the middle line.
a. Linear classifier
b. Support vector machine
c. Decision tree
d. Multiple regression
8. (LO 3-3) Auditing financial statements, looking for errors, anomalies and possible fraud,
is most consistent with which type of analytics?
a. Descriptive analytics
b. Diagnostic analytics
c. Predictive analytics
d. Prescriptive analytics
9. (LO 3-4) Models associated with regression and classification data approaches all have
these important parts except:
a. identifying which variables (we’ll call these independent variables) might help pre-
dict an outcome (we’ll call this the dependent variable).
b. the functional form of the relationship (linear, nonlinear, etc.).
c. the numeric parameters of the model (detailing the relative weights of each of the
variables associated with the prediction).
d. test data.
10. (LO 3-1) Which approach to Data Analytics attempts to assign each unit in a population
into a small set of classes where the unit belongs?
a. Classification
b. Regression
c. Similarity matching
d. Co-occurrence grouping

ISTUDY
Discussion and Analysis
®

1. (LO 3-4) What is the difference between a target and a class?


2. (LO 3-3) What is the difference between a supervised and an unsupervised approach?
3. (LO 3-4) What is the difference between training datasets and test (or testing) datasets?
4. (LO 3-1) Using Exhibit 3-1 as a guide, what are two data approaches associated with
the descriptive analytics approach?
5. (LO 3-1) Using Exhibit 3-1 as a guide, what are four data approaches associated with
the diagnostic analytics approach?
6. (LO 3-3) How might the data reduction approach be used in auditing?
7. (LO 3-4) How might classification be used in approving or denying a potential fraudu-
lent credit card transaction?
8. (LO 3-3) How is similarity matching different from clustering?
9. (LO 3-3) How does fuzzy match work? Give an accounting situation where it might be
most useful.
10. (LO 3-3) Compare and contrast the profiling data approach and the development of
standard cost for a unit of production at a manufacturing company. Discuss how these
approaches are similar and how they are different.
11. (LO 3-4) Exhibits 3-14, 3-15, and 3-18 suggest that volume and distance are the best
predictors of “days to ship” for a wholesale company. What other variables would also
be useful in predicting the number of “days to ship”?

Problems
®

1. (LO 3-1) Match the test approach to the appropriate type of Data Analytics:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Test Approach Analytics Type

1. Clustering

2. Classification

3. Summary statistics

4. Decision support systems

5. Link prediction

6. Co-occurrence grouping

7. Machine learning and artificial intelligence

8. Similarity matching

9. Data reduction or filtering

10. Profiling

11. Regression

148

ISTUDY
2. (LO 3-2) Identify the order sequence in the data reduction approach to descriptive ana-
lytics (i.e., 1 is first; 4 is last).

Steps of the Data Reduction Approach Sequence Order (1 to 4)


1. Filter the results
2. Identify the attribute you would like to focus on
3. Interpret the results
4. Follow up on results

3. (LO 3-1) Match the accounting question to the appropriate type of Data Analytics:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Accounting Question Analytics Type


1. W
 hat are the expected stock returns to our investment in
Facebook stock?
2. W
 hat was the price and quantity variance associated with
the production of chicken at Tyson?
3. W
 hat are the cash needs and projections over the next
3 months?
4. What were the total taxes paid in the past 5 years?
5. If we expect our Asian sales to increase, where should we
produce them?
6. S
 hould we ship by truck, rail, or air given the expected in-
crease in fuel expenses?
7. Our refunds seem to be high. Are they fraudulent?
8. Which product sold the most last month?

4. (LO 3-4) Identify the order sequence in the classification approach to descriptive analyt-
ics (i.e., 1 is first; 6 is last).

Steps of the Data Reduction Approach Sequence Order (1 to 6)


1. Select a set of classification models.
2. Manually classify an existing set of records.
3. Divide your data into training and testing parts.
4. Interpret the results and select the “best” model.
5. Identify the classes you wish to predict.
6. Generate your model.

5. (LO 3-4) Match the classification definitions to the appropriate classification terminology:
• Training data
• Test data
• Decision trees
• Decision boundaries
• Support vector machine
• Overfitting
• Underfitting

ISTUDY
Classification Definition Classification Terms
1. T
 echnique used to mark the location of the split between
one class or another.
2. A
 set of data used to assess the degree and strength of a
predicted relationship established by the training data.
3. A
 modeling error when the derived model poorly fits a
limited set of data points.
4. E
 xisting data that have been manually evaluated and
assigned a class.
5. A
 discriminating classifier that is defined by a separating
hyperplane that works to find the widest margin (or biggest
pipe).
6. Tool used to divide data into smaller groups.
7. A
 modeling error when the derived model too closely fits a
limited set of data points.

6. (LO 3-3) Related party transactions involve people who have close ties to an organi-
zation, such as board members. Assume an accounting manager decides that fuzzy
matching would be a useful technique to find undisclosed related party transactions.
Using the fields below, identify the pairings between the related party table and the
vendor and customer tables that could independently identify a fuzzy match.

Vendor Fuzzy Match? RelatedParty Fuzzy Match? Customer


VendorState RelatedState CustomerState
VendorName RelatedName CustomerName
VendorZip RelatedZip CustomerZip
VendorAddDate RelatedAddDate CustomerAddDate
VendorAddress RelatedAddress CustomerAddress
VendorType RelatedPosition CustomerType
VendorCity RelatedCity CustomerCity

7. (LO 3-3) An auditor is trying to figure out if the inventory at an electronics store chain
is obsolete. From the following list, identify whether each attribute would be useful for
predicting inventory obsolescence or not.

Predictive Attributes Predictive?


1. Inventory location
2. Purchase date
3. Manufacturer
4. Identified as obsolete
5. Sales revenue
6. Inventory color
7. Inventory cost
8. Inventory description
9. Inventory size
10. Inventory type
11. Days since last purchase

150

ISTUDY
8. (LO 3-4) An auditor is trying to figure out if the goodwill its client recognized when it
purchased another company has become impaired. What characteristics might be used
to help establish a model predicting goodwill impairment? Label each of the following
as either a Supervised data approach or Unsupervised data approach.

Data Approach Supervised/Unsupervised Data Approach?


1. Classification
2. Link prediction
3. Co-occurrence grouping
4. Causal modeling
5. Profiling
6. Regression

9. (LO 3-3) Analysis: How might clustering be used to describe customers who owe
money (accounts receivable)?
10. (LO 3-2) Analysis: Why would the use of data reduction be useful to highlight related
party transactions (e.g., CEO has their own separate company that the main company
does business with)?
11. (LO 3-2) An investor wants to do an analysis of the industry’s inventory turnover using
XBRL. Indicate the XBRL tags that would be used with an inventory turnover calculation.

Query XBRL Tag


1. Company Filter:
2. Date Filter:
3. Numerator:
4. Denominator

12. (LO 3-2) Identify the behavior, error, or fraudulent scheme that could be detected when
you apply Benford’s law to the following accounts.

Account Behavior Detected Using Benford’s Law


1. Sales records
2. Purchases
3. Travel and entertainment
4. Vendor payments
5. Sales returns

Chapter 3 Appendix: Setting Up a Classification Analysis


To answer the question “Will a new vendor ship a large order on time?” using classification,
you should clearly identify your variables, define the scope of your data, and assign classes.
This is related to “master the data” in the IMPACT model.

Identify Your Variables


Because this question is related to vendors and order shipments, take a moment to think
about attributes that might be predictive. What attributes would you need to address the
following questions: Would the total number of order items potentially cause a delay? Are
certain types of items shipped more timely than others? How about the overall ­shipping
weight . . . does that impact the timeliness of shipments? Does the vendor’s physical

ISTUDY
distance from a company’s warehouse matter? How about the age of vendor relationship or
number of vendor employees? What else?

Define the Scope


Because you are looking at vendor shipments, you would need—at the basic level—data
related to the original purchase order (order date, number of items), shipping data (ship-
ping date, weight), and vendor master data (location, age, size). This will help you narrow
down your data request and make it more likely that you’ll get the data you request by an
established deadline. As you’re preparing your data, you’ll want to join these tables so that
each record represents an order. You’ll also want to calculate any figures of merit, such as
the number of days (Ship date – Order date), volume (total number of items on the order or
physical size), or distance (Vendor address – Warehouse address) (see Table 3-A1).

TABLE 3-A1 Days to


Vendor Shipments Total Ship: Distance (mi):
Items: (Ship Date – (V ­Coordinates – Weight
PO# Sum (Qty) Order Date Ship Date Order Date) Vendor WH Coordinates)* (lb)

123456 15 7/30/2020 8/2/2020 3 ABC Company 45 160

Distance Formula
You can use a distance formula in Excel to calculate the distance in miles or kilometers
between the warehouse and the vendor. First, you determine the latitude and longitude
based on the address, then use the following formula. Note: Use first number 3959 for miles
or 6371 for kilometers.
3959 * ACOS(SIN(RADIANS([Lat])) * SIN(RADIANS([Lat2])) +
COS(RADIANS([Lat])) * COS(RADIANS([Lat2])) * COS(RADIANS
([Long2]) – RADIANS([Long])))

Assign Classes
Take a moment to define your classes. You are trying to predict whether a given order
shipment will either be “On-time” or “Delayed” based on the number of days it takes from
the order date to the shipping date. What does “on-time” mean? Let’s define on-time as an
order that ships in 5 days or less and a delayed order as one that ships later than 5 days.
You’ll use this rule to add the class as a new attribute to each of your historical records (see
Table 3-A2).
On-time = (Days to ship ≤ 5)
Delayed = (Days to ship > 5)

TABLE 3-A2 Total Distance (mi):


Shipment Class Items: Days to ship: (V ­Coordinates –
Sum Order Ship (Ship Date – WH Weight
PO# (Qty) Date Date Order Date) Vendor Coordinates)* (lb) Class

123456 15 7/30/2020 8/2/2020 3 ABC Company 45 160 On-time


123457 20 7/30/2020 8/5/2020 6 XYZ Company 120 800 Delayed

152

ISTUDY
LABS ®

Lab 3-1 Descriptive Analytics: Filter and Reduce Data—Sláinte


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As an auditor of Sláinte, you are tasked to identify internal controls
weaknesses. To help you focus on areas with increased risk, you rely on data reduction to
focus your efforts and limit your audit scope. For example, you may want to look only at
transactions for a given year. In this lab, you will learn to use filters and matching using
vendor and employee records, a common auditor analysis. For one audit test, you have been
asked to see if there are any potential matches between vendors and employees that might
indicate risk of related-party transactions or fraud.
Data: Lab 3-1 Slainte Dataset.zip - 106KB Zip / 114KB Excel

Lab 3-1 Example Output


By the end of this lab, you will combine data to identify similar matches. While your results
will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop + Power Query

Microsoft Power BI Desktop


LAB 3-1M Example of Fuzzy Matching in Microsoft Power Query

ISTUDY
Tableau | Prep

Tableau Software, Inc. All rights reserved.


LAB 3-1T Example of Fuzzy Matching in Tableau Prep

Lab 3-1 Match Suppliers and Employees with Similar


Addresses
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-1 [Your name] [Your email address].docx.
When identifying matches by address, it is important to consider not only the street
address, but also the zip code. This is because many cities have common street names, such
as Second Avenue. Combining the address and zip code reduces the likelihood of false posi-
tives, or results that look like they match but are in completely different locations.

Microsoft | Power BI Desktop + Power Query

1. Open a new workbook in Power BI Desktop and connect to data:


a. From the Home tab in the ribbon, click Get Data > Excel.
b. Navigate to your Lab 3-1 Slainte Dataset.xlsx file and click Open.
c. In the Navigator window, check the boxes next to Employee_Listing and
Suppliers and click Transform Data.
2. Join the employee and suppliers tables with a fuzzy match to identify pos-
sible related parties:
a. Fuzzy matching supports only text columns. We will be matching the
Supplier_Address, Supplier_Zip, Employee_Street_Address, and
Employee_Zip columns, so ensure they are all data type text. If they are
not, adjust the data types. If prompted to Replace current or Add a new
step, select Replace current.

154

ISTUDY
b. Click the Suppliers query then go to the Home tab in the ribbon and
click Merge Queries.
c. In the Merge window:
1. Choose Employee_Listing from the drop-down menu above the bot-
tom table.
2. From the Join Kind drop-down menu, choose Inner (only matching
rows).
3. Check the box next to Use fuzzy matching to perform the merge.
4. Click the arrow next to Fuzzy matching options and set the similarity
threshold to 0.5.
5. Now, hold the Ctrl key and select the Supplier_Address and
Supplier_Zip columns in the Suppliers table in the preview at the top
of the screen.
6. Finally, hold the Ctrl key and select the Employee_Street_Address
and Employee_Zip columns in the Employee_Listing table in the
preview at the bottom of the screen.
7. Click OK to merge the tables.
d. In the Power Query Editor, scroll to the right until you find
Employee_Listing and click the expand (two arrows) button to the right
of the column header.
e. Click OK to add all of the employee columns.
f. From the Home tab in the ribbon, click Choose Columns.
g. Uncheck (Select all columns) and check the following attributes and
click OK:
1. Supplier_Company_Name
2. Supplier_Address
3. Supplier_Zip
4. Employee_First_Name
5. Employee_Last_Name
6. Employee_Street_Address
7. Employee_Zip
h. Take a screenshot (label it 3-1MA).
3. You may notice that you have too many matches. Now adjust your fuzzy
match to show fewer, more likely matches:
a. In the Query Steps panel on the right, click the gear icon next to the
Merged Queries step to open the Merge panel with fuzzy match options.
b. Click the arrow next to Fuzzy matching options and change the Similar-
ity threshold value to 0.7.
c. Click OK to return to Power Query Editor.
d. In the Query Steps panel on the right, click the last step to expand and
remove the other columns: Removed Other Columns.
e. Take a screenshot (label it 3-1MB).
4. Answer the lab questions, then close Power Query and Power BI. Save your
Power BI workbook as 3-1 Slainte Fuzzy.pbix.

ISTUDY
Tableau | Prep

1. Open Tableau Prep Builder and connect to your data:


a. Click Connect to Data > To a File > Microsoft Excel.
b. Locate the Lab 3-1 Slainte Dataset.xlsx file on your computer and click
Open.
2. Drag Employee_Listing to your flow.
3. Add a Clean step after Employee_Listing. To prepare your data for the fuzzy
match, there are two steps to take in the clean step: adjust the data type
for Employee_Zip to text/string and create a calculation to combine the
Employee_Street_Address field with the Employee_Zip field:
a. Click the # icon above the Employee_Zip and change the data type to String.
b. Click Create Calculated Field. . . and input the following:
1. Field Name: Address
2. Calculation: [Employee_Street_Address] + “ ” + [Employee_Zip]
4. Take a screenshot (label it 3-1TA) showing your combined address field.
5. Drag Suppliers to your flow.
6. Add a Clean step after Suppliers. Perform the same steps here, adjust the
Supplier_Zip data type to string and create the calculated field to combine
the address and zip. Note: Be sure to name the new field the exact same as
the way you named it in the Employee_Listing table—this is critical for the
Union step you perform next!
7. Drag one of the Clean steps on top of the other Clean step and you will see
options for either Join or Union. Drop the Clean step directly on top of the
Union option.
8. Click the Address attribute and click the More Options button (a button with
three dots).
9. In the Address attribute, click Group Values > Common Characters.
10. Click Sort to sort the values in the Address attribute so you can see the items
that were grouped by the fuzzy match (their common characters).
11. Click each address bar to examine the matched records and determine if they
are suspicious matches, or multiple grouped values.
12. Take a screenshot (label it 3-1TB).
13. Answer the lab questions, then close Tableau Prep. Save your flow as 3-1
Slainte Fuzzy.tfl.

Lab 3-1 Objective Questions (LO 3-2)


OQ1. Which supplier is a likely match for a Sláinte employee?

Lab 3-1 Analysis Questions (LO 3-2)


AQ1. Why is only one of the fuzzy matches in the first match a likely match and not
the remaining ones?
AQ2. What additional data would be useful to understand the nature of the matched
values?

156

ISTUDY
AQ3. If you were the employee committing fraud, what would you try to do with the
data to evade detection?

Lab 3-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 3-2  iagnostic Analytics: Identify Data


D
Clusters—LendingClub
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As an analyst, you have been tasked with evaluating characteris-
tics that drive interest rates that were attached to the various loans matched through
­LendingClub. This will provide insight into how your company may deal with assigning
credit to customers.
Data: Lab 3-2 Lending Club Transform.zip - 29MB Zip / 26MB Excel / 6MB Tableau

Lab 3-2 Example Output


By the end of this lab, you will create a cluster analysis that will let you explore groups
with similar characteristics. While your results will include different data values, your work
should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 3-2M Example of Cluster Analysis in Microsoft Power BI Desktop

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 3-2T Example of Cluster Analysis in Tableau Desktop

Lab 3-2 Cluster Analysis of Borrowers


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-2 [Your name] [Your email address].docx.
In this analysis, we want to understand the interest rates assigned to groups of loans. A
cluster analysis allows us to identify natural groupings of individual observations that we
can then use to compare with each other or assign some intuitive labels to.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and connect to your data:


a. From the Home tab in the ribbon, click Get Data > Excel.
b. Navigate to Lab 3-2 Lending Club Transform.xlsx and click Open.
c. Check the box next to LoanStats3c and click Load.
2. Create a filter on the date:
a. In the Fields pane, expand the LoanStats3c table, then expand issue_d >
Date Hierarchy > Month.
b. Drag Month to the Filters on all pages section of the Filters pane and
check the box next to January.
3. Add a scatter chart to your report to show the groupings of interest rate by
average loan amount and average debt-to-income ratio:
a. Rename Page 1 tab to Clusters.
b. Click the Scatter Chart visualization to add it to your report.

158

ISTUDY
Rev. Confirming Pages

c. In the Fields pane, drag the following values to the appropriate fields,
then click the drop-down menu next to each to set the summary measure
(e.g., Sum, Average, Count):
1. X Axis: dti > Average
2. Y Axis: loan_amnt > Average
3. Values: int_rate
4. Now show clusters in your data:
a. Click the three dots in top-right corner of your scatter chart visualization
and choose Automatically Find Clusters.
b. Enter the following parameters and click OK.
1. Name: Clusters by Interest Rate
2. Number of clusters: 6
c. Right-click on your scatter plot and choose Show as Table.
5. Finally, clean up your report. In the Visualizations pane, click the Format
visual (paintbrush) icon and add the following:
a. Visual > X axis > Title > Average Debt-to-Income Ratio
b. Visual > Y axis > Title > Average Loan Amount
c. General > Title > Text > Interest Clusters
6. Take a screenshot (label it 3-2MA) of your report.
7. When you have finished answering the questions, close Power BI Desktop
and save your workbook as 3-2 Lending Club Clusters.pbix.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop and load your data:


a. Click Connect > To a File > More. . .
b. Navigate to your Lab 3-2 Lending Club Transform.hyper file and click
Open.
2. Create a scatter plot to show the groupings of interest rate by average loan
amount and average debt-to-income ratio:
a. Go to Sheet1 and rename the sheet Interest Clusters.
b. Drag Issue D to the Filters shelf to limit the date range. A Filter Field
window appears.
1. Choose Months and click Next.
2. Check the box next to January and click OK.
c. Drag Dti to Columns shelf, then click the SUM(Dti) pill and choose
Measure > Average.
d. Drag Loan Amnt to the Rows shelf, then click the SUM(Loan Amnt) pill
and choose Measure > Average.
e. Drag Int Rate to the Detail button on the Marks shelf, then click the
SUM(Int Rate) pill and choose Dimension. You should now see your
scatter plot.

ISTUDY ric44907_ch03_114-179.indd 159 02/11/23 12:14 PM


3. Finally, add clusters to your model:
a. Click the Analytics tab on the left side of the screen.
b. Drag Cluster onto your scatter plot.
c. Set the number of clusters to 6 and close the pane.
d. Right-click your scatter plot and choose View Data. . . and drag the data
window down so you can see both the scatter plot and the data.
e. Take a screenshot (label it 3-2TA) of your chart and data.
4. Answer the lab questions, and then close Tableau. Save your worksheet as
3-2 Lending Club Clusters.twb.

Lab 3-2 Objective Questions (LO 3-3)


OQ1. Interest rates on loans from this period range from 6 percent to 26 percent.
Hover over a single value in the top-right cluster. What interest rate is assigned
to this value?
OQ2. Would you expect interest rates for loans that appear in the bottom-left cluster
to be high (closer to 26 percent) or low (closer to 6 percent)?
OQ3. How many clusters have five or more different interest rates assigned to them?

Lab 3-2 Analysis Questions (LO 3-3)


AQ1. Would you expect interest rates to be correlated with loan amount and debt-to-
income ratios? Why or why not?
AQ2. What do you notice about the interest rates assigned to outliers (clusters with
only one or two observations)?
AQ3. Compare and contrast: If you created clusters in both Power BI and ­Tableau, were
the cluster assignments identical? Why might they be the same or different?

Lab 3-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 3-3 P
 erform a Linear Regression Analysis—College
Scorecard
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Company Summary: The data used are a subset of the College Scorecard dataset that is
provided by the U.S. Department of Education. These data provide federal financial aid and
earnings information, insights into the performance of schools eligible to receive federal
financial aid, and the outcomes of students at those schools. You can learn more about how
the data are used and view the raw data yourself at collegescorecard.ed.gov/data. However,
for this lab, you should use the text file provided to you.
Data: Lab 3-3 College Scorecard Transform.zip - 1.8MB Zip / 1.3MB Excel / 1.4MB
Tableau

160

ISTUDY
Lab 3-3 Example Output
By the end of this lab, you will create a regression to understand relationships with data.
While your results will include different data values, your work should look similar to this:

Microsoft | Excel + Power BI Desktop

Microsoft Power BI Desktop


LAB 3-3M Example of Regression in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 3-3T Example of Regression in Tableau Desktop

ISTUDY
Lab 3-3 Prepare Your Data and Create a Linear
Regression
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-3 [Your name] [Your email address].docx.
This lab relies upon the steps completed in Lab 2-5 in which the data were prepared. For
a description of the variables, refer to the data dictionary in Appendix K.
We will begin with a simple regression with two variables, SAT average (SAT_AVG) and
completion rate for first-time, full-time students at four-year institutions (C150_4).
A note about regressions: You can perform regressions on data only where a value is
present for both the explanatory and response variable. If there are blank or null values, you
will encounter errors at best and inaccurate models at worst. Make sure you filter your data
before beginning the analysis.

Microsoft | Excel + Power BI Desktop

Note: This lab will show how to calculate a regression using both Excel and Power BI.
Excel provides more detailed numbers, such as the slope and y-intercept so that we can
build a prediction model. However, if we don’t need to build the model but we want
to include a regression model visualization in a report, then PowerBI is a good choice.
1. Open a new workbook in Excel and connect to your data:
a. From the Data ribbon, click Get Data > From File > From Workbook.
b. Navigate to your Lab 3-3 College Scorecard Transform.xlsx file and click
Open.
c. Check the Lab_3_3_College_Scorecard_Dataset table and click
­Transform or Edit.
d. Click the drop-down menu next to SAT_AVG and click Remove Empty
to remove null values.
e. Click the drop-down menu next to C150_4 and click Remove Empty to
remove null values.
f. Click Close & Load.
2. Before you can perform a regression in Excel, you need to first enable the
Data Analysis ToolPak.
a. Go to File > Options > Add-ins.
b. Next to Manage at the bottom of the screen, choose Excel Add-ins and
click Go. . . .
c. Check the box next to Analysis ToolPak and click OK. You will now see
the Data Analysis option in the Data tab on the Excel ribbon.
3. Now you can perform you regression:
a. From the Data tab in the ribbon click the Data Analysis button.
b. Choose Regression from the Analysis Tools list and click OK. A regres-
sion window will pop up for you to input your variables.
c. For the Input Y Range, click the select (up arrow) button and
highlight the data that contains your response variable: C150_4
($AA$1:$AA$1272).

162

ISTUDY
Rev. Confirming Pages

d. For the Input X Range, click the select (up arrow) button and
highlight the data that contain your response variable: SAT_AVG
($H$1:$H$1272).
e. If your selections contain column headers, place a check mark in the box
next to Labels.
f. Click OK. This will run the regression test and place the output on a new
spreadsheet in your Excel workbook.
g. Click the R Square value and highlight the cell yellow.
4. Take a screenshot (label it 3-3MA) of your regression output.
5. When you are finished answering the lab questions you may close Excel. Save
your file as Lab 3-3 College Scorecard Regression.xlsx.
Now repeat the model using Power BI:
1. Open a new workbook in Power BI and connect to your data:
a. From the Home tab on the ribbon, click Get Data > Excel.
b. Navigate to your Lab 3-3 College Scorecard Transform.xlsx file and click
Open.
c. Check the Lab_3_3_College_Scorecard_Dataset table and click Load.
Note: We will assume that this Power BI report would include more
data models and visualizations in addition to the regression, so we
will filter the values on the report page instead of in Power Query like
we did with Excel.
d. In the View tab in the ribbon, click the Page view button and change to
Actual Size.
2. Begin by creating a scatter chart on your report:
a. In the Visualizations panel, click the Scatter chart button to add a new
visual to your report.
b. Drag the following attributes to the visualization fields panel:
1. Values: UNITID
2. X Axis (explanatory variable): SAT_AVG
3. Y Axis (response variable): C150_4
c. To remove null values from our analysis, go to the Filters panel and adjust
the following:
1. Drag SAT_AVG from the fields list to Filters on this page.
2. From the Show items when the value: drop-down menu, choose is not
blank and click Apply Filter.
3. Drag C150_4 from the fields list to Filters on this page.
4. From the Show items when the value: drop-down menu, choose is not
blank and click Apply Filter.
d. Now add your regression line:
1. In the Visualizations panel, click the Analytics (magnifying glass)
­button.
2. Click Trend line > On.
e. Take a screenshot (label it 3-3MB) of your scatter chart.

ISTUDY ric44907_ch03_114-179.indd 163 02/11/23 07:53 AM


Rev. Confirming Pages

3. Next, calculate the coefficient of correlation (R) and coefficient of determi-


nation (R2):
a. Click the Data option from the toolbar on the left to view your data
table.
b. In the Table tools tab on the ribbon, click Quick measure. A new win-
dow appears.
c. From the Calculation list, select Correlation coefficient and assign the
following fields and click OK. A new field will appear titled SAT_AVG
and C150_4 correlation for UNITID.
1. Category: UNITID
2. Measure X: SAT_AVG
3. Measure Y: C150_4
d. Rename the variable Coefficient of Correlation in the first line of the
code and click the check mark to the left of the formula bar to apply
changes.
e. To create the coefficient of determination, click New Measure in the
Measure tools tab in the ribbon.
f. Enter Coefficient of Determination = [Coefficient of Correlation]^2
(type over the default text of “Measure =”) and click the check mark to
the left of the formula bar to apply changes.
4. Now add the coefficient values to your report:
a. Click the Report button in the toolbar on the left to return to your
report.
b. In the Visualizations panel, click the Multi-row card button to add a new
textbox to your report.
c. Drag the Coefficient of Correlation and Coefficient of Determination to
the Fields box in the Visualizations panel.
d. To show some summary statistics, drag UNITID, SAT_AVG, and
C150_4 to the Fields box in the Visualizations panel. By default they will
show the count and sum of these values.
e. Click the drop-down menu next to both SAT_AVG and C150_4 and
choose Average to show the average values.
5. Finally, add a little polish to your report.
a. Add a title to the scatter chart.
1. Click the scatter chart card.
2. In the Visualizations panel, click the Format Visual (paintbrush) button.
3. Click General > Title > On and enter “Regression for Average SAT
and Completion Rate” in the Text box.
b. Resize both cards to show all of the titles and information.
c. Take a screenshot (label it 3-3MC) of your scatter chart.
6. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 3-3 College Scorecard Regression.pbix.

164

ISTUDY ric44907_ch03_114-179.indd 164 02/20/23 12:45 PM


Tableau | Desktop

1. Open a new workbook in Tableau Desktop and connect to your data:


a. Click Connect to Data > To a File > More. . .
b. Navigate to your Lab 3-3 College Scorecard Transform.hyper file and click
Open.
2. Now create your scatter plot:
a. Double-click Sheet 1 and rename it Regression.
b. Drag Sat Avg to the Columns shelf. This is your independent or X mea-
sure.
c. Click the down arrow in the SUM(Sat Avg) pill and choose Dimension
from the list.
d. Drag C150 4 to the Rows shelf. This is your dependent or Y measure.
e. Click the down arrow in the SUM(C150 4) pill and choose Dimension
from the list.
f. Take a screenshot (label it 3-3TA) of your scatter plot.
3. Add a trendline to show the regression:
a. Click the Analytics tab in the pane on the left side of the Tableau win-
dow.
b. Drag Trend Line to your model and choose Linear.
c. Right-click the new trend line on your plot and choose Describe Trend
Model from the menu. Move the new window to the bottom-right corner
of your graph.
4. Take a screenshot (label it 3-3TB) of the plot with your regression/trend
line and trend model details.
5. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 3-3 College Scorecard Regression.twb.

Lab 3-3 Objective Questions (LO 3-4)


OQ1. What is the value of the coefficient of determination (R2)?
OQ2. Does SAT Average appear to be predictive of college completion rate?
OQ3. What is the probability of college completion for a student with an SAT score of
1100?

Lab 3-3 Analysis Questions (LO 3-4)


AQ1. Would you expect SAT average and completion rate to be correlated? If so,
would you expect the correlation to be positive or negative?
AQ2. When determining relationships between variables, one of the criteria for a
potential causal relationship is that the cause must happen before the effect.
Regarding SAT average and completion rate, which would you determine to be
the potential cause? Which would be the effect?

ISTUDY
AQ3. Identifying the cause and effect as you did in Q2 can help you determine the
explanatory and response variables. Which variable, SAT average or completion
rate, is the explanatory variable?

Lab 3-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 3-4 C
 omprehensive Case: Descriptive Analytics:
Generate Summary Statistics—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. So far you have analyzed the ER Diagram to gain a bird’s-eye view of all
the different tables and fields in the database, and you have explored the data in each table
to gain a glimpse at sample values from each field and how they are all formatted. You also
gained a little insight into the distribution of sample values across each field, but at this
point you are ready to dig into the data a bit more.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 3-4 Example Output


By the end of this lab, you will create a dashboard that will let you explore financial senti-
ment along many dimensions for SP100 companies. While your results will include differ-
ent data values, your work should look similar to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 3-4M Example Summary Statistics in Microsoft Excel

166

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 3-4T Example Summary Statistics in Tableau Desktop

Lab 3-4 Calculate and Compare Summary Statistics


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-4 [Your name] [Your email address].docx.
In the previous comprehensive labs in Chapter 2, you mastered the data by connecting to
full tables in Tableau or Power BI to explore the data. In this lab, instead of connecting to full
tables, we will write a SQL query to pull only a subset of data into Tableau or Excel. This tactic
is more effective when the database is very large and you can derive insights from a sample of
the data. We will analyze 5 days’ worth of transaction data from September 2016. In this lab
we will compare distributions of numerical data measures associated with transactions.
There are three numerical values associated with the transaction amount in the Transact
table: ORIG_PRICE, SALE_PRICE, and TRAN_AMT. While you can also use summary
statistics to help master the data, when we use summary statistics in descriptive analytics, we
are seeking to understand what has happened in the past to identify what items we might want
to dig into further with descriptive statistics via data reduction or via diagnostic analytics.

Microsoft | Excel + Power Query

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT TRANSACT.*
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’

ISTUDY
d. Click OK.
e. Click Edit to open Power Query Editor.
3. Take a screenshot (label it 3-4MA).
4. Click the Home tab on the ribbon and then click Close & Load.
5. While you can calculate these statistics individually using formulas like
SUM() and AVERAGE(), you can also have Excel calculate them automati-
cally through the Data Analysis ToolPak. If you haven’t added this compo-
nent into Excel yet, follow this menu path: File > Options > Add-ins. From
this window, select the Go. . . button, and then place a check mark in the
box next to Analysis ToolPak. Once you click OK, you will be able to access
the ToolPak from the Data tab on the Excel ribbon.
6. Click the Data Analysis button from the Data tab on the Excel ribbon and
select Descriptive Statistics and click OK.
a. For the Input Range, select the three columns that we are measuring,
ORIG_PRICE, SALE_PRICE, and TRAN_AMT. Leave the default to
columns, and place a check mark in Labels in First Row.
b. Place a check mark next to Summary Statistics, then press OK. This
may take a few moments to run because it is a large dataset.
7. Take a screenshot (label it 3-4MB) of the summary statistic results.
8. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 3-4 Dillard’s Stats.xlsx.

Tableau | Desktop

1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT TRANSACT.*
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
e. Click OK.
f. Click Update Now to preview your data.
3. Take a screenshot (label it 3-4TA).
4. Click Sheet 1 and rename it Summary Statistics.

168

ISTUDY
5. Drag the following fields to the Columns shelf:
a. ORIG_PRICE
b. SALE_PRICE
c. TRAN_AMT
6. From the Analysis menu, uncheck Aggregate Measures. This will change the
bars to show individual circles for each observation in the dataset. This may
take a few seconds to run.
7. From the Worksheet menu, check Show Summary.
8. Hide the Show Me pane to see your summary statistics.
9. Click the drop-down menu in the Summary card and check Standard
Deviation.
10. Take a screenshot (label it 3-4TB).
11. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 3-4 Dillard’s Stats.twb.

Lab 3-4 Objective Questions (LO 3-2)


OQ1. Which field has the highest average (mean)?
OQ2. What is the maximum for ORIG_PRICE?
OQ3. What is the maximum for SALE_PRICE?

Lab 3-4 Analysis Questions (LO 3-2)


AQ1. Why is the average for TRAN_AMT so much less than the averages for ORIG_
PRICE and SALE_PRICE?

Lab 3-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 3-5  omprehensive Case: Diagnostic Analytics:


C
Compare Distributions—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. While you were mastering and exploring the data, you discovered that
the highest average transaction amount per state is in Arkansas. At first blush, this struck
you as making sense, Dillard’s is located in Arkansas after all. However, after digging into
your data a bit more, you find out that the online distribution center is located in Maumelle,
Arkansas—that means that every Dillard’s online transaction is processed in Arkansas! You
wonder if that might have an impact on the average transaction amount in Arkansas.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

ISTUDY
Lab 3-5 Example Output
By the end of this lab, you will create a chart that will let you compare online and in-person
sales. While your results will include different data values, your work should look similar
to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 3-5M Example Comparison of Distributions in Microsoft Excel

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 3-5T Example Comparison of Distributions in Tableau Desktop

Lab 3-5 Part 1 Prepare the Data and Compare the


Distributions of Online Sales and In-Person Sales
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-5 [Your name] [Your email address].docx.

170

ISTUDY
In this lab, we will separate the online sales and the in-person sales and view how their
distributions differ. After viewing the distributions, we will run a hypothesis t-test to deter-
mine if the average transaction amount for online sales versus in-person sales is significantly
different. We cannot complete hypothesis t-tests in Tableau without combining it with R or
Python, so Part 2 of this lab will only be in the Microsoft Excel path.

Microsoft | Excel + Power Query

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
d. Click OK.
e. Click Edit or Transform Data.
3. Now that you have created a data connection, you can create new columns
to isolate the online sales from the in-person sales. The online sales store
ID is 698. This is a two-step process, beginning with adding a conditional
column and then pivoting the new conditional column.
a. From the Power Query ribbon, select Add Column then Conditional
Column.
b. Input the following in the Add Conditional Column window:
1. New column name: Dummy Variable
2. Column Name: Store
3. Operator: Equals
4. Value: 698
5. Output: Online
6. Otherwise: In-Person
c. Click OK.
d. Select your new Dummy Variable column.
e. From the Power Query ribbon, select Transform, then Pivot Column.
f. Select TRAN_AMT from the Pivot Column window and click OK. This
action may take a few moments.
g. From the Power Query ribbon, select the Home tab and Close & Load.
4. Now that you have loaded your data in Excel, you can construct box plots to
compare the distributions of Online and In-Person sales.
a. Select all of the data from the In-Person and Online columns.

ISTUDY
b. From the ribbon, select Insert > Statistical Charts > Box and Whisker.
1. Glancing at this chart, it appears that there are some extreme outli-
ers in the In-Person category, but if you expand the chart you will
see the average and median are higher for online sales. To view the
values for the summary statistics, you can select the column and view
the results in the status bar or you can calculate them manually with
functions (=avg or =median) or run descriptive statistics from the
Data Analysis ToolPak.
5. Take a screenshot (label it 3-5MA).
6. Answer the lab questions, then continue to Part 2.

Tableau | Desktop

1. The data preparation for working with the box plots in Tableau Desktop
is much simpler because we are not also preparing our data for hypothesis
t-test analysis.
2. Open Tableau Desktop.
3. Go to Connect > To a Server > Microsoft SQL Server.
4. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
5. Instead of connecting to a table, you will create a New Custom SQL query.
Double-click New Custom SQL and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
6. Click Sheet 1 and rename it Sales Analysis.
7. Double-click TRAN_AMT.
8. From the Analysis menu, uncheck Aggregate Measures. This will change the
bars to show individual circles for each observation in the dataset. This may
take a few seconds to run.
9. To resize the circles, click the Size button on the Marks pane and drag it
until it is close to the left.
10. To add a box-and-whiskers plot, click the Analytics tab and drag Box Plot to
the Cell button on your sheet.
11. This box plot shows all of the transaction amounts (both online and in-­person).
To view two separate box plots, you need to first create a new variable to
isolate online sales from in-person sales.
a. From the Analysis menu, select Create Calculated Field. . .
b. Replace the default Calculation1 with the name Dummy Variable.

172

ISTUDY
c. Input the following Calculation: IF [STORE] = 698 THEN “Online”
ELSE “In-Person” END.
d. Drag Dummy Variable to the Columns shelf.
e. The default size is thin and a little difficult to read. Changing the axis
to a logarithmic scale will let you see the box plot better. Right-click the
vertical axis (TRAN_AMT) and choose Edit Axis. . .
f. Check the box next to Logarithmic and give your axis a friendly title, like
Transaction Amount. Close the window to return to your chart.
12. You now have two separate box plots to compare. The In-Person box plot
has a much wider range and some quite high outliers, but the interquartile
range for the Online box plot is broader. This suggests that we would want
to explore this data further to understand the very high outliers in the In-
Person transactions and if the averages are significantly different between
the two sets of transactions.
13. Take a screenshot (label it 3-5TA).
14. Answer the lab questions and close your workbook. Save it as Lab 3-5 ­Dillards
Sales Analysis.twb. The Tableau track does not have a Part 2 for this lab.

Lab 3-5 Part 1 Objective Questions (LO 3-2, 3-3)


OQ1. What is the highest outlier from the in-person transactions?
OQ2. What is the highest outlier for online transactions?
OQ3. What is the median transaction for in-person transactions?
OQ4. What is the median transaction for online transactions?

Lab 3-5 Part 1 Analysis Questions (LO 3-3)


AQ1. What insights can you derive from comparing the two box plots?

Lab 3-5 Part 2 Run the Hypothesis Test


Now that you have visualized the different distributions, you can run your t-test using
Excel’s Data Analysis ToolPak to determine if the average for online sales transactions is
significantly greater than the average for in-person sales transactions.

Microsoft | Excel + Power Query

1. From Microsoft Excel, click the Data Analysis button from the Data tab on
the ribbon.
a. If you haven’t added this component into Excel yet, follow this menu
path: File > Options > Add-ins. From this window, select the Go. . .
button, and then place a check mark in the box next to Analysis
ToolPak. Once you click OK, you will be able to access the ToolPak
from the Data tab on the Excel ribbon.

ISTUDY
b. Select t-test: Two Sample Assuming Unequal Variances.
1. Variable 1 Range: select all In-Person transactions (Column O).
2. Variable 2 Range: select all Online transactions (Column P).
3. Place a check mark next to Labels.
4. Click OK.
2. Take a screenshot (label it 3-5MB) of the t-test output.

Tableau | Desktop
This portion of the lab cannot be completed using Tableau.

Lab 3-5 Part 2 Objective Questions (LO 3-3)


OQ1. What is the p-value for the one-tailed test?
OQ2. Using an alpha of 0.05 (5 percent), should you reject or fail to reject the null
hypothesis?

Lab 3-5 Part 2 Analysis Questions (LO 3-3)


AQ1. Describe in plain English what the results of this t-test mean.
AQ2. Based on the results of this hypothesis test, what would you recommend to
­Dillard’s regarding online versus in-person sales?
AQ3. Based on the results of this hypothesis test, how would you adjust analyzing
transactions based in Arkansas?

Lab 3-5 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 3-6 C
 omprehensive Case: Create a Data Abstract and
Perform Regression Analysis—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: After running diagnostic analysis on Dillard’s transactions, you have
found that there is a statistically significant difference between the amount of money cus-
tomers spend in online transactions versus in-person transactions. You decide to take this
a step further and design a predictive model to help determine how much a customer will
spend based on the transaction type (online or in-person). You will run a simple regression
to create a predictive model using one explanatory variable in Part 1, and in Part 2 you will
extend your analysis to add in an additional explanatory variable—tender type (the method
of payment the customer chooses).

174

ISTUDY
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 3-6 Example Output


By the end of this lab, you will create a regression analysis. While your results will include
different data values, your work should look similar to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 3-6M Example Regression in Microsoft Excel

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 3-6T Example Regression in Tableau Desktop

ISTUDY
Lab 3-6 Part 1 Perform an Analysis of the Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 3-6 [Your name] [Your email address].docx.
Dillard’s is trying to figure out when its customers spend more on individual transac-
tions. We ask questions regarding how Dillard’s sells its products.

Microsoft | Excel + Power Query

1. From Microsoft Excel, click the Data tab on the ribbon.


A. Note: If you completed Lab 3-5, you can edit the query from your Lab
3-5 spreadsheet instead of starting from scratch. Skip Step 2 below, and
open your Lab 3-5 spreadsheet instead. Click into your data, and a tab
for Query will appear in the ribbon. From the Query tab, select Edit to
open the Power Query editor. In Query Settings (the pane to the right),
delete the Pivot step. Now you can proceed to step 3 below.
2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
d. Click OK.
e. Click Edit or Transform Data.
3. Now that you have created a data connection, you can create new columns to
isolate the online sales from the in-person sales. The online sales store ID is
698. Unlike the way you performed this change to run a t-test, this time you
only need to create the Conditional Column (no need to Pivot the columns),
but you will name the variables 1 and 0 instead of online and in-person. This
is because regression analysis requires all variables to be numeric.
a. From the Power Query ribbon, select Add Column, then Conditional
Column.
b. Input the following in the Add Conditional Column window:
1. New column name: Online-Dummy
2. Column Name: STORE
3. Operator: equals
4. Value: 698
5. Output: 1
6. Otherwise: 0
c. Click OK.
d. From the Power Query ribbon, select the Home tab and Close & Load.
4. Take a screenshot (label it 3-6MA).

176

ISTUDY
5. Perform a regression analysis by performing the following steps:
a. Click on the Data Analysis button in the Data tab in the ribbon. If you
do not have the Data Analysis ToolPak added in, see Appendix C to
learn how to add it to Excel.
b. Click Regression, and then click OK.
c. Reference the cells that contain the TRAN_AMT in the Input Y Range
and Online-Dummy in the Input X Range and then click OK. Note:
Regression Analysis does not accept null values, so it will not work to
select the entire columns for your Y and X ranges. Be sure to select
only the column label and the values; the values should extend to row
570,062.
6. Take a screenshot (label it 3-6MB) of your results.
7. Save your file as Lab 3-6 Dillard’s Regression.xlsx.

Tableau | Desktop

1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
e. Click OK.
3. Take a screenshot (label it 3-6TA).
4. Click Sheet 1 and name it Regression.
5. Double-click TRAN_AMT to add it to the Rows shelf.
6. From the Analysis menu, uncheck Aggregate Measures. This will change the
bars to show individual circles for each observation in the dataset. This may
take a few seconds to run.
7. To isolate online sales from in-person sales, you need to create a calculated
field. Unlike the way you performed this change to create box plots in the
Diagnostic Analytics lab, this time you must name the variables 1 and 0
instead of online and in-person.
a. From the Analysis menu, select Create Calculated Field. . .
b. Replace the default Calculation1 with the name Online-Dummy.

ISTUDY
c. Input the following Calculation: IF [STORE] = 698 THEN 1 ELSE 0
END.
8. Drag Online-Dummy to the Columns shelf.
9. From the Analysis tab click Trend Lines > Show All Trend Lines.
10. To see the trend line more clearly, right-click the vertical axis (Tran Amt)
and choose Edit Axis. Check the box next to Logarithmic, add a friendly axis
title such as Transaction Amount, and close the window.
11. Use your mouse to hover over the trend line to reveal the regression formula
and p-value.
12. Take a screenshot (label it 3-6TB).
13. Save your file as Lab 3-6 Dillard’s Regression.twb.

Lab 3-6 Part 1 Objective Questions (LO 3-4)


OQ1. Using the regression formula, what is the predicted amount a customer will
spend on an online transaction?
OQ2. What is the R Square value?
OQ3. What is the p-value provided for the explanatory variable?
OQ4. If you completed the Excel track, what is the Significance F value?

Lab 3-6 Part 1 Analysis Questions (LO 3-4)


AQ1. How would you interpret the results of your analysis of Q1 in plain English?
What does it imply to have both a very low R Square value and a very low
­Significance F value or p-value?

Lab 3-6 Part 2 Extend the Predictive Model to Include an


Additional Explanatory Variable
In this second part, you will make your predictive model more interesting by adding an addi-
tional explanatory variable. After performing some descriptive analysis during which you com-
pared transaction amounts based on tender_type (cash, credit/debit card, Dillard’s store card,
etc.), you determined that customers tend to spend more on average when they use a credit or
debit card. You can add an additional dummy variable to your dataset to isolate credit/debit
card transactions from all other types of transactions in order to add this explanatory variable
to your regression model. The indicator in the data for credit/debit card transactions is BANK.

Microsoft | Excel + Power Query

1. Return to your Lab 3-6 Regression.xlsx file and edit the existing query.
a. The Queries & Connections window should still be available on the
right side of your workbook. If it is, double-click on Query 1 to open the
Power Query editor.
1. If the Queries & Connections window is not showing to the right of your
workbook, click the Data tab in the ribbon > Queries & ­Connections.

178

ISTUDY
2. Add a new conditional column. This time name it BANK-Dummy.
a. New column name: BANK-Dummy
b. Column Name: TENDER_TYPE
c. Operator: =
d. Value: BANK
e. Output: 1
f. Otherwise: 0
3. Close and load the data back into Excel.
4. Now it is time to run the regression. You will take similar steps to what you
did in Part 1 of this lab, but this time you will select both dummy variable
columns for your x-variable.
a. Click on Data Analysis button in the Data tab in the ribbon.
b. Click Regression and then click OK.
c. Reference the cells that contain the TRAN_AMT in the Input Y Range
and Online-Dummy and BANK-Dummy in the Input X Range and then
click OK.
5. Take a screenshot (label it 3-6MC) of your results.
6. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 3-6 Dillard’s Regression.xlsx.

Tableau | No Lab
This portion of the lab is not possible in Tableau Prep nor in Tableau Desktop.

Lab 3-6 Part 2 Objective Questions (LO 3-4)


OQ1. What is the coefficient for the BANK-Dummy variable?
OQ2. What is the coefficient for the Online-Dummy variable? (Careful—it is not the
same as it was in the simple regression output!)
OQ3. What is the expected transaction amount for a customer who uses their credit
card for an online purchase?

Lab 3-6 Part 2 Analysis Questions (LO 3-4)


AQ1. How would you interpret the relative impact of the Online-Dummy variable
­versus the BANK-Dummy variable in plain English?
AQ2. Consider the adjusted r-squared values for both models you created. How would
you describe the relative impact of the BANK-dummy variable on the overall
model?

Lab 3-6 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

ISTUDY
Chapter 4
Communicating Results and
Visualizations

A Look at This Chapter


This chapter wraps up the introduction to the IMPACT model by explaining how to communicate your results
through data visualization and through written reports. Creating a chart takes more skill and practice than simply
adding in a bar chart through the Excel chart wizard, and this chapter will help you identify the purpose for your data
visualization so that you can choose the best chart for your dataset. We will also help you learn how to refine your
chart so that it communicates as efficiently and effectively as possible. The chapter concludes by describing how to
provide a written report tailored to specific audiences who will be interested in the results of your data analysis.

A Look Back
In Chapter 3, we considered various models and techniques used for Data Analytics and discussed when to use them
and how to interpret the results. We also provided specific accounting-related examples of when each of these specific
data approaches and models is appropriate to address our particular question.

A Look Ahead
Most of the focus of Data Analytics in accounting is focused on auditing, managerial accounting, financial statement
analysis, and tax. This is partly due to the demand for high-quality data and the need for enhancing trust in the assur-
ance process, informing management for decisions, and aiding investors as they select their portfolios. In Chapter 5,
we look at how both auditors and managers are using technology in general to improve the decisions being made. We
also introduce how Data Analytics helps facilitate continuous auditing and reporting.

180

ISTUDY
One of the first uses of a heat map as a form of data visualization is also one of history’s most impactful. In the mid-
1800s, there was a worldwide cholera pandemic. Scientists were desperate to determine the cause to put a stop to
the pandemic, and one of those scientists, John Snow, studied a particular London neighborhood that was suffering
from a large number of cholera cases in 1854. Snow created a map of the outbreak that included small bar charts on
the streets indicating the number of people affected by the disease across different locations in the neighborhood.
He suspected that the outbreak was linked to water, so he also drew small crosses on the map to indicate water
sources. Through this visualization, Snow was able to identify that the people who were dying nearly all had one thing
in common—they were drinking out of the same water source. This led to the discovery of cholera being conveyed
through contaminated water. Exhibit 4-1A shows Snow’s 1854 cholera map.
Software and methods for creating heat maps to visualize epidemics has improved since 1854, but the purpose
still exists. Using a heat map to visualize clusters of people impacted by epidemics helps researchers, health profes-
sionals, and policy makers identify patterns and ultimately inform decisions about how to resolve epidemics. For
example, in Exhibit 4-1B, this map can help readers quickly come to insight about where the overdose epidemic is
most prevalent.
Without Snow’s hypothesis, methods for testing it, and ultimately communicating the results through data
­visualization, the 1854 cholera outbreak would have continued with scientists still being uncertain of the cause of
cholera.

EXHIBIT 4-1A

Source: John Snow. On the Mode of Communication of Cholera. 2nd


ed. London: John Churchill, 1855.

ISTUDY
Drug overdose
deaths per 100,000
0–4
4.1–8
8.1–12
12.1–16
16.1–20
>20

EXHIBIT 4-1B

Source: CDC

OBJECTIVES
After reading this chapter, you should be able to:

LO 4-1 Differentiate between communicating results using statistics and


visualizations.
LO 4-2 Determine the purpose of your data visualization.
LO 4-3 Choose the best chart for your dataset.
LO 4-4 Refine your chart to communicate efficiently and effectively.
LO 4-5 Communicate your results in a written report.

182

ISTUDY
Chapter 4 Communicating Results and Visualizations 183

Data are important, and Data Analytics is effective, but they are only as important and
effective as we can communicate and make the data understandable. One of the authors
often asks her students what they would do if they were interns and their boss asked them to
supply information regarding in which states all of the customers their organization served
were located. Would they simply point their boss to the Customers table in the sales data-
base? Would they go a step further and isolate the attributes to the Company Name and the
State? Perhaps they could go even further and run a quick query or PivotTable to perform
a count on the number of customers in each different state that the company serves. If they
were to give their boss what they actually wanted, however, they should provide a short writ-
ten summary of the answer to the research question, as well as an organized chart to visual-
ize the results. Data visualization isn’t just for people who are “visual” learners. When the
results of data analysis are visualized appropriately, the results are made easier and quicker
to interpret for everybody. Whether the data you are analyzing are “small” data or “big” data,
they still merit synthesis and visualization to help your stakeholders interpret the results with
ease and efficiency.
Think back to some of the first data visualizations and categorizations you were exposed
to (the food guide pyramid/food plate, the animal kingdom, the periodic table) and, more
modernly, how frequently infographics are applied to break down a series of complicated
information on social media. These charts and infographics make it easier for people to
understand difficult concepts by breaking them down into categories and visual components.

COMMUNICATING RESULTS LO 4-1


As part of the IMPACT model, the results of the analysis need to be communicated. As Differentiate
with selecting and refining your analytical model, communicating results is more art than between
science. Once you are familiar with the tools that are available, your goal should always be communicating
to share critical information with stakeholders in a clear, concise manner. This communica- results using
tion could involve the use of a written report, a chart or graph, a callout box, or a few key statistics and
statistics. Depending on the needs of the decision maker, different means of communica- visualizations.
tions may be considered.

Differentiating between Statistics and Visualizations


In 1973, a statistician named Francis Anscombe illustrated the importance of visualizations
using four datasets that had nearly identical descriptive summary statistics, yet appeared
very different when the distributions were graphed and visualized. It came to be known as
“Anscombe’s Quartet” and emphasized the importance of visualizations used in tandem
with the underlying statistical properties.
To further illustrate, Exhibit 4.2 shows both the detailed observations in the four data-
sets and the summary statistics. You’ll note the nearly identical summary statistics for each
of the four datasets.

ISTUDY
184 Chapter 4 Communicating Results and Visualizations

EXHIBIT 4-2
Anscombe’s Quartet
(Data)

It is only when the data points are visualized that you see that the datasets are quite dif-
ferent. Even the regression results are not able to differentiate between the various datasets
(as shown in Exhibit 4.3). While not always the case, the example of Anscombe’s Quartet
would be a case where visualizations would more readily communicate the results of the
analysis better than statistics.

EXHIBIT 4-3
Figure Plotting the
Four Datasets in
­Anscombe’s Quartet

Visualizations Increasingly Preferred over Text


Increasingly, visualizations are preferred to written content to communicate results. For
example, 91 percent of people prefer visual content over written content.1 Why is that? Well,
some argue that the brain processes images 60,000 times faster than text (and 90 percent of
1
 ohar Dayan, “Visual Content: The Future of Storytelling,” Forbes, April 2, 2018, https://www.forbes.
Z
com/sites/forbestechcouncil/2018/04/02/visual-content-the-future-of-storytelling/?sh=6517bfbe3a46,
April 2, 2018 (accessed January 14, 2021).

ISTUDY
Chapter 4 Communicating Results and Visualizations 185

information transmitted to the brain is visual).2 As further evidence, on Facebook, photos


have an interaction rate of 87 percent, compared to 4 percent or less for other types of posts,
such as links or text.3 While some company executives may prefer text to visualizations, recent
experience suggests that visualizations are increasingly an effective way to communicate to
management and for management to communicate to firm stakeholders.

Data Analytics at Work

Data Visualization: Why a Picture Can Be Worth a Thousand Clicks


CFOs have always been encouraged to become better “storytellers,” by
communicating important messages about company performance, strategy,
and prospects in the simplest terms in a way that all company stakeholders
can understand. But the sheer volume of data can be overwhelming, making
it challenging to tell a coherent story.

Gorodenkoff/Shutterstock

Enter data visualization, an enabling technology that complements analytics and


related data-crunching tools, allowing finance to produce user-friendly reports
and other presentations that can be tailored to specific audiences. CFOs seem
to already appreciate the potential of data visualization, ranking it highly on their
digital wish list and acknowledging its value as part of a broader effort to lever-
age analytics and enhance its efforts to communicate performance.
Source: Deloitte, “Data Visualization: Why a Picture Can Be Worth a Thousand Clicks,” CFO
Insights, October 2017, https://www2.deloitte.com/us/en/pages/finance/articles/cfo-insights-
data-visualization.html (accessed February 12, 2021).

DETERMINE THE PURPOSE OF YOUR LO 4-2


DATA VISUALIZATION Determine the
Visualizations have become very popular over the past three decades. Managers use dash- purpose of your
boards to quickly evaluate key performance indicators (KPIs) and quickly adjust operational data visualization.
2
 arry Eisenberg, “Humans Process Visual Data Better,” Thermopylae Sciences + ­Technology, ­September
H
15, 2014, https://www.t-sciences.com/news/humans-process-visual-data-better#:~:text=Visualization%
20works%20from%20a%20human,to%20the%20brain%20is%20visual (accessed ­January 14, 2021).
3
Hannah Whiteoak, “Six Reasons to Embrace Visual Commerce in 2018,” Pixlee, https://www.pixlee.com/
blog/six-reasons-to-embrace-visual-commerce-in-2018/ (accessed January 14, 2021).

ISTUDY
186 Chapter 4 Communicating Results and Visualizations

tasks; analysts use graphs to plot stock price and financial performance over time to select
portfolios that meet expected performance goals.
In any project that will result in a visual representation of data, the first charge is ensur-
ing that the data are reliable and that the content necessitates a visual. In our case, however,
ensuring that the data are reliable and useful has already been done through the first three
steps of the IMPACT model.
At this stage in the IMPACT model, determining the method for communicating your
results requires the answers to two questions:
1. Are you explaining the results of previously done analysis, or are you exploring the data
through the visualization? (Is your purpose declarative or exploratory?)
2. What type of data are being visualized (conceptual [qualitative] or data-driven
[quantitative])?
Scott Berinato, senior editor at Harvard Business Review, summarizes the possible
answers to these questions4 in a chart shown in Exhibit 4-4. The majority of the work that
we will do with the results of data analysis projects will reside in quadrant 2 of Exhibit 4-4,
the declarative, data-driven quadrant. We will also do a bit of work in Exhibit 4-4’s quadrant
4, the data-driven, exploratory quadrant. There isn’t as much qualitative work to be done,
although we will work with categorical qualitative data occasionally. When we do work
with qualitative data, it will most frequently be visualized using the tools in quadrant 1, the
declarative, conceptual quadrant.

EXHIBIT 4-4 Declarative


The Four Chart Types
Source: S. Berinato, Good
Charts: The HBR Guide
to Making Smarter, More 1 2
­Persuasive Data Visualizations
(­Boston: Harvard Business
Review Press, 2016).
Conceptual Data-driven
(Qualitative) (Quantitative)

3 4

Exploratory

Once you know the answers to the two key questions and have determined which quad-
rant you’re working in, you can determine the best tool for the job. Is a written report with a
simple chart sufficient? If so, Word or Excel will suffice. Will an interactive dashboard and
repeatable report be required? If so, Tableau may be a better tool. Later in the chapter, we
will discuss these two tools in more depth, along with when each should be used.

Quadrants 1 and 3 versus Quadrants 2 and 4:


Qualitative versus Quantitative
Qualitative data are categorical data. All you can do with these data is count them and group
them, and in some cases, you can rank them. Qualitative data can be further defined in two
4
S. Berinato, Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations
(Boston: Harvard Business Review Press, 2016).

ISTUDY
Chapter 4 Communicating Results and Visualizations 187

ways, nominal data and ordinal data. Nominal data are the simplest form of data. Examples
of nominal data are hair color, gender, and ethnic groups. If you have a set of data on peo-
ple with different hair color, you can count the number of individuals who fit into the same
hair color category, but you cannot rank it (brown hair isn’t better than red hair), nor can
you take an average or do any other further calculations beyond counting (you can’t take an
average of “blonde”). Increasing in complexity, but still categorized as qualitative data, are
ordinal data. Ordinal data can also be counted and categorized like nominal data but can
go a step further—the categories can also be ranked. Examples of ordinal data include gold,
silver, and bronze medals, 1–5 rating scales on teacher evaluations, and letter grades. If you
have a set of data of students and the letter grades they have earned in a given course, you
can count the number of instances of A, B, C, and so on, and you can categorize them, just
like with nominal data. You can also sort the data meaningfully—an A is better than a B,
which is better than a C, and so on. But that’s as far as you can take your calculations—as
long as the grades remain as letters (and aren’t transformed into the corresponding numeri-
cal grade for each individual), you cannot calculate an average, standard deviation, or any
other more complex calculation.
Beyond counting and possibly sorting (if you have ordinal data), the primary statistic
used with qualitative data is proportion. The proportion is calculated by counting the num-
ber of items in a particular category, then dividing that number by the total number of
observations. For example, if you had a dataset of 150 people and had each individual’s
corresponding hair color with 25 people in your dataset having red hair, you could calculate
the proportion of red-haired people in your dataset by dividing 25 (the number of people
with red hair) by 150 (the total number of observations in your dataset). The proportion of
red-haired people, then, would be 16.7 percent.
Qualitative data (both nominal and ordinal) can also be referred to as “conceptual” data
because such data are text-driven and represent concepts instead of numbers.
Quantitative data are more complex than qualitative data because not only can they be
counted and grouped just like qualitative data, but the differences between each data point
are meaningful—when you subtract 4 from 5, the difference is a numerical measure that can
be compared to subtracting 3 from 5. Quantitative data are made up of observations that
are numerical and can be counted and ranked, just like ordinal qualitative data, but that can
also be averaged. A standard deviation can be calculated, and datasets can be easily com-
pared when standardized (if applicable).
Similar to qualitative data, quantitative data can be categorized into two different types:
interval and ratio. However, there is some dispute among the analytics community on whether
the difference between the two datasets is meaningful, and for the sake of the analytics and
calculations you will be performing throughout this textbook, the difference is not pertinent.
The simplest way to express the difference between interval and ratio data is that ratio data
have a meaningful 0 and interval data do not. In other words, for ratio data, when a dataset
approaches 0, 0 means “the absence of.” Consider money as ratio data—we can have 5 dol-
lars, 72 dollars, or 8,967 dollars, but as soon as we reach 0, we have “the absence of” money.
Interval data do not have a meaningful 0; in other words, in interval data, 0 does not mean
“the absence of” but is simply another number. An example of interval data is the Fahren-
heit scale of temperature measurement, where 90 degrees is hotter than 70 degrees, which
is hotter than 0 degrees, but 0 degrees does not represent “the absence of” temperature—­it’s
just another number on the scale.
Due to the “meaningful 0” difference between interval and ratio data, ratio data are
considered the most sophisticated form of data. This is because the meaningful zero allows
us to calculate fractions, proportions, and percentages—ratios reflecting the relationship
between values. However, we can perform all other arithmetic functions on both interval
and ratio data. In Chapter 3, you learned more about statistical tests such as hypothesis

ISTUDY
188 Chapter 4 Communicating Results and Visualizations

testing, regression, and correlation. We can run all of these tests and calculate the mean,
median, and standard deviation on interval and ratio data.
Quantitative data can be further categorized as either discrete or continuous data.
­Discrete data are data that are represented by whole numbers. An example of discrete data
is points in a basketball game—you can earn 2 points, 3 points, or 157 points, but you can-
not earn 3.5 points. On the other hand, continuous data are data that can take on any value
within a range. An example of continuous data is height: you can be 4.7 feet, 5 feet, or
6.27345 feet. The difference between discrete and continuous data can be blurry sometimes
because you can express a discrete variable as continuous—for example, the number of chil-
dren a person can have is discrete (a woman can’t have 2.7 children, but she could have 2 or
3). However, if you are researching the average number of children that women aged 25–40
have in the United States, the average would be a continuous variable. Whether your data
are discrete or continuous can also help you determine the type of chart you create because
continuous data lend themselves more to a line chart than do discrete data.

A Special Case of Quantitative Data:


The Normal Distribution
Chapter 3 mentions the concept of the normal distribution in the context of profiling in
continuous auditing. The normal distribution is a phenomenon that many naturally occur-
ring datasets in our world follow, such as SAT scores and heights and weights of new-
born babies. For a distribution of data to be considered normal, the data should have equal
median, mean, and mode, with half of the observations falling below the mean and the
other half falling above the mean. If you are comparing two datasets that follow the normal
distribution, even if the two datasets have very different means, you can still compare them
by standardizing the distributions with Z-scores. By using a formula, you can transform
every normal distribution into a special case of the normal distribution called the standard
normal distribution, which has 0 for its mean (and thus, for its mode and median, as well)
and 1 for its standard deviation. The benefit of standardizing your data during a comparison
of two datasets is to no longer have to compare wildly different numbers and attempt to eye-
ball how one observation differs from the other—if you standardize both datasets, you can
place both distributions on the same chart and more swiftly generate insights.

Quadrants 1 and 2 versus Quadrants 3 and 4:


Declarative versus Exploratory
In the context of the labs and tools we’re providing through this textbook, the majority
of your data visualizations created in step C of the IMPACT model will be created with
a declarative purpose. Declarative visualizations are the product of wanting to “declare”
or present your findings to an audience. The data analysis projects begin with a question,
proceed through analysis, and end with communicating those findings. This means that
while the visualization may prompt conversation and debate, the information provided in
the charts should be solid. Even if your analysis in the previous steps of the IMPACT model
has been exploratory, by the time you have arrived to communicate your results, you are
declaring what you have found.

Lab Connection
Lab 4-1, Lab 4-3, and Lab 4-4 have you create dashboards with visualizations
that present declarative data.

ISTUDY
Chapter 4 Communicating Results and Visualizations 189

On the other hand, you will sometimes use data visualizations to satisfy an exploratory
visualization purpose. When this is done, the lines between steps “P” (perform test plan),
“A” (address and refine results), and “C” (communicate results) are not as clearly divided.
Exploratory data visualization will align with performing the test plan within visualization
software—for example, Tableau—and gaining insights while you are interacting with the
data. Often the presenting of exploratory data will be done in an interactive setting, and
the answers to the questions from step “I” (identify the questions) won’t have already been
answered before working with the data in the visualization software.

Lab Connection
Lab 4-2 and Lab 4-5 have you create dashboards with visualizations that
­present exploratory data.

Exhibit 4-5 is similar to the first four chart types presented to you in Exhibit 4-4, but Exhibit
4-5 has more detail to help you determine what to do once you’ve answered the first two
questions. Remember that the quadrant represents two main questions:
1. Are you explaining the results of the previously done analysis, or are you exploring the
data through the visualization? (Is your purpose declarative or exploratory?)
2. What type of information is being visualized (conceptual [qualitative] or data-driven
[quantitative])?

EXHIBIT 4-5
The Four Chart Types
Quadrant with Detail
Source: S. Berinato, Good
Charts: The HBR Guide to
Making Smarter, More Persua-
sive Data Visualizations (Bos-
ton: Harvard Business Review
Press, 2016).

Once you have determined the answers to the first two questions, you are ready to begin deter-
mining which type of visualization will be the most appropriate for your purpose and dataset.
In Chapter 1, you were introduced to the Gartner Magic Quadrant for Business Intelli-
gence and Analytics Platforms, and through the labs in the previous three chapters you have
worked with Microsoft products (Excel, Power Query, Power BI) and Tableau products
(Tableau Prep and Tableau Desktop). In Chapter 4 and the remaining chapters, you will
continue having the two tracks to learn how to use each tool, but when you enter your pro-
fessional career, you may need to make a choice about which tool to use for communicating

ISTUDY
190 Chapter 4 Communicating Results and Visualizations

your results. While both Microsoft and Tableau provide similar methods for analysis and
visualization, we offer a discussion on when you may prefer each solution.
Microsoft’s tools slightly outperform Tableau in their execution of the entire analytics
process. Tableau is a newer product and has placed the majority of its focus on data visu-
alization, while Microsoft Excel has a more robust platform for data analysis. If your data
analysis project is more declarative than exploratory, it is more likely that you will perform
your data visualization to communicate results in Excel, simply because it is likely that you
performed steps 2 through 4 in Excel, and it is convenient to create your charts in the same
tool that you performed your analysis.
Tableau Desktop and Microsoft Power BI earn high praise for being intuitive and easy to use,
which makes them ideal for exploratory data analysis. When your question isn’t fully defined
or specific, exploring your dataset in Tableau or Power BI and changing your visualization
type to discover different insights is as much a part of performing data analysis as crafting
your communication. While we recognize that you have already worked with Tableau in
previous labs, now that our focus has turned toward data visualization, we recommend
opening the Superstore Sample Workbook provided within Tableau Desktop to explore dif-
ferent types of visualizations. You will find the Superstore Sample Workbook at the bottom
of the start screen in Tableau Desktop under “Sample workbooks” (Exhibit 4-6).

EXHIBIT 4-6
Tableau Software, Inc. All
rights reserved.

ISTUDY
Chapter 4 Communicating Results and Visualizations 191

Once you open the workbook, you will see a variety of tabs at the bottom of the work-
book that you can page through and see different ways that the same dataset can be ana-
lyzed and visualized. When you perform exploratory analysis in Tableau, or even if you have
already performed your analysis and you have uploaded the dataset into Tableau to com-
municate insights, we recommend trying several different types of charts to see which one
makes your insights stand out the most effectively. In the top-right corner of the Tableau
workbook, you will see the Show Me window, which provides different options for visual-
izing your dataset (Exhibit 4-7).

EXHIBIT 4-7
Tableau Software, Inc. All
rights reserved.

In the Show Me tab, only the visualizations that will work for your particular dataset will
appear in full color.
For more information on using Tableau, see Appendix I.

ISTUDY
192 Chapter 4 Communicating Results and Visualizations

PROGRESS CHECK
1. What are two ways that complicated concepts were explained to you via catego-
rization and data visualization as you were growing up?
2. Using the Internet or other resources (other textbooks, a newspaper, or a magazine),
identify an example of a data visualization for each possible quadrant.
3. Identify which type of data scale the following variables are measured on
(­qualitative nominal, qualitative ordinal, or quantitative):
a. Instructor evaluations in which students select excellent, good, average, or poor.
b. Weekly closing price of gold throughout a year.
c. Names of companies listed on the Dow Jones Industrial Average.
d. Fahrenheit scale for measuring temperature.

LO 4-3 CHOOSING THE RIGHT CHART


Choose the best Once you have determined the type of data you’re working with and the purpose of your
chart for your data visualization, the next questions have to do with the design of the visualization—color,
dataset. font, graphics—and most importantly, type of chart/graph. The visual should speak for itself
as much as necessary, without needing too much explanation for what’s being represented.
Aim for simplicity over bells and whistles that “look cool,” but end up being distracting.

Charts Appropriate for Qualitative Data


Because qualitative and quantitative data have such different levels of complexity and
sophistication, there are some charts that are not appropriate for qualitative data that do
work for quantitative data.
When it comes to visually representing qualitative data, the charts most frequently con-
sidered are:
• Bar (or column) charts.
• Pie charts.
• Stacked bar chart.
The pie chart is probably the most famous (some would say infamous) data visualization
for qualitative data. It shows the parts of the whole; in other words, it represents the propor-
tion of each category as it corresponds to the whole dataset.
Similarly, a bar chart also shows the proportions of each category as compared to each
of the others.
In most cases, a bar chart is more easily interpreted than a pie chart because our eyes are
more skilled at comparing the height of columns (or the lengths of horizontal bars, depend-
ing on the orientation of your chart) than they are at comparing sizes of pie, especially if the
proportions are relatively similar.
Consider the two different charts from the Sláinte dataset in Exhibit 4-8. Each compares
the proportion of each beer type sold by the brewery.
The magnitude of the difference between the Imperial Stout and the IPA is almost impos-
sible to see in the pie chart. This difference is easier to digest in the bar chart.
Of course, we could improve the pie chart by adding in the percentages associated with
each proportion, but it is much quicker for us to see the difference in proportions by glanc-
ing at the order and length of the bars in a bar chart (Exhibit 4-9).
The same set of data could also be represented in a stacked bar chart or a 100 percent
stacked bar chart (Exhibit 4-10). The first figure in Exhibit 4-10 is a stacked bar chart that

ISTUDY
Chapter 4 Communicating Results and Visualizations 193

30%
EXHIBIT 4-8
Imperial Pie Charts and Bar
25%
Stout (or Column) Chart
20%
IPA Show Different Ways to
15% ­Visualize Proportions
Stout 10%
Pale Ale 5%
Wheat 0%
Imperial IPA Stout Pale Ale Wheat Imperial
Imperial Stout IPA
IPA

7%
EXHIBIT 4-9
Imperial Stout Pie Chart Showing
11% 28% Proportion
IPA
Stout
12%
Pale Ale
17% 26% Wheat
Imperial IPA

150 100% EXHIBIT 4-10


Example of Stacked
140 Bar Chart
90%
130

120 80%

110
70%
100
% of Total Number of Records

90 60%
Number of Records

80
50%
70

60 40%

50
30%
Product Description
40
Imperial Stout
20%
30 IPA
Stout
20
10% Pale Ale
10 Wheat
Imperial IPA
0%
0

ISTUDY
194 Chapter 4 Communicating Results and Visualizations

shows the proportion of each type of beer sold expressed in the number of beers sold for
each product, while the latter shows the proportion expressed in terms of percentage of the
whole in a 100 percent stacked bar chart.
While bar charts and pie charts are among the most common charts used for qualitative
data, there are several other charts that function well for showing proportions:
• Tree maps and heat maps. These are similar types of visualizations, and they both use
size and color to show proportional size of values. While tree maps show proportions
using physical space, heat maps use color to highlight the scale of the values. However,
both are heavily visual, so they are imperfect for situations where precision of the num-
bers or proportions represented is necessary.
• Symbol maps. Symbol maps are geographic maps, so they should be used when express-
ing qualitative data proportions across geographic areas such as states or countries.
• Word clouds. If you are working with text data instead of categorical data, you can repre-
sent them in a word cloud. Word clouds are formed by counting the frequency of each
word mentioned in a dataset; the higher the frequency (proportion) of a given word, the
larger and bolder the font will be for that word in the word cloud. Consider analyzing
the results of an open-ended response question on a survey; a word cloud would be a
great way to quickly spot the most commonly used words to tell if there is a positive or
negative feeling toward what’s being surveyed. There are also settings that you can put
into place when creating the word cloud to leave out the most commonly used English
words—such as the, an, and a—in order to not skew the data. Exhibit 4-11 is an example
of a word cloud for the text of Chapter 2 from this textbook.
EXHIBIT 4-11
Word Cloud Example
from Chapter 2 Text

Charts Appropriate for Quantitative Data


The data visualization and chart possibilities for charting quantitative data not only include
those available for qualitative data (you can group and count it), but they have even more
sophistication. You can use pie charts (with the same varying level of success) and bar
charts with quantitative data, but you can also use a lot more.
There are many different methods for visualizing quantitative data. With the exception of
the word cloud, all of the methods mentioned in the previous section for qualitative data can
work for depicting quantitative data, but the following charts can depict more complex data:
• Line charts. Show similar information to what a bar chart shows, but line charts are
good for showing data changes or trend lines over time. Line charts are useful for con-
tinuous data, while bar charts are often used for discrete data. For that reason, line
charts are not recommended for qualitative data, which by nature of being categorical,
can never be continuous.

ISTUDY
Chapter 4 Communicating Results and Visualizations 195

• Box and whisker plots. Useful for when quartiles, medians, and outliers are required for
analysis and insights.
• Scatter plots. Useful for identifying the correlation between two variables or for identify-
ing a trend line or line of best fit.
• Filled geographic maps. As opposed to symbol maps, a filled geographic map is used
to illustrate data ranges for quantitative data across different geographic areas such as
states or countries.
A summary of the chart types just described appears in Exhibit 4-12. Each chart option
works equally well for exploratory and declarative data visualizations. The chart types are
categorized based on when they will be best used (e.g., when comparing qualitative vari-
ables, a bar chart is an optimal choice), but this figure shouldn’t be used to stifle creativity—
bar charts can also be used to show comparisons among quantitative variables, just as many
of the charts in the listed categories can work well with other data types and purposes than
their primary categorization below.

Conceptual Data-Driven
EXHIBIT 4-12
(Qualitative) (Quantitative) Summary of Chart
Types

Comparison: Outlier detection:


Bar chart Box and whisker plot
Pie chart
Stacked bar chart
Relationship between two variables:
Tree map
Scatter plot
Heat map

Geographic data: Trend over time:


Symbol map Line chart

Text data: Geographic data:


Word cloud Filled map

As with selecting and refining your analytical model, communicating results is more
art than science. Once you are familiar with the tools that are available, your goal should
always be to share critical information with stakeholders in a clear, concise manner. While
visualizations can be incredibly impactful, they can become a distraction if you’re not care-
ful. For example, bar charts can be manipulated to show a bias and, while novel, 3D graphs
are incredibly deceptive because they may distort the scale even if the numbers are fine.

Learning to Create a Good Chart by (Bad) Example


Other than getting practice by looking at good visualizations and modifying the way you
visualize your dataset in Tableau to see how different insights are showcased, one of the best
ways to learn how to create a good visualization is to look at some problematic visualizations.
In the chart depicted in Exhibit 4-13, the Daily Mail, a UK-based newspaper, tried to
emphasize an upgrade in the estimated growth of British economy. The estimate from the
Office of National Statistics indicated that Q4 growth would be 0.7 percent instead of 0.6
percent (a relatively small increase of about 15 percent). Yet the visualization makes it appear
as if this is a 200 percent increase because of the scale the newspaper chose. Another issue is
that some time has passed between the estimates, and we don’t see that disclosed here.

ISTUDY
196 Chapter 4 Communicating Results and Visualizations

EXHIBIT 4-13 2016 Q4 Growth Upgraded


Bar Chart Distorting
Data Comparison by
Using Inappropriate
Second
Scale 0.70%
estimate
Source: http://www.dailymail.
co.uk/news/article-4248690/
Economy-grew-0-7-final-three-
months-2016.html

First
estimate 0.60%

0.55% 0.60% 0.65% 0.70% 0.75%

If we reworked the data points to show the correct scale (starting at 0 instead of 0.55) and
the change over time (plotting the data along the horizontal axis), we’d see something like
Exhibit 4-14. If we wanted to emphasize growth, we might choose a chart like Exhibit 4-15.
Notice that both new graphs show an increase that is less dramatic and confusing.

EXHIBIT 4-14 2016 Q4 Growth


Bar Chart Using 0.8%
­Appropriate Scale
0.7%
for Less Biased
Comparison 0.6%

0.5%

0.4%

0.3%

0.2%

0.1%

0%
First Second
estimate estimate

EXHIBIT 4-15 2016 Q4 Growth


Alternative Stacked Bar
0.8
Chart Showing Growth
0.7
New
estimate
0.6

0.5

0.4
Old
0.3
estimate
0.2

0.1

ISTUDY
Chapter 4 Communicating Results and Visualizations 197

Our next example of a problematic method of data visualization is in Exhibit 4-16. The
data represented come from a study assessing cybersecurity attacks, and this chart in par-
ticular attempted to describe the number of cybersecurity attacks employees fell victim to,
as well as what their role was in their organization.
Assess the chart provided in Exhibit 4-16. Is a pie chart really the best way to present
these data?
There are simply too many slices of pie, and the key referencing the job role of each user
is unclear. There are a few ways we could improve this chart.
If you want to emphasize users, consider a rank-ordered bar chart like Exhibit 4-17. To
emphasize the category, a comparison like that in Exhibit 4-18 may be helpful. Or to show
proportion, maybe use a stacked bar (Exhibit 4-19).

Zachery; 2 Alanna; 3 EXHIBIT 4-16


Yareli; 3 Ann; 1 Difficult to Interpret
Willie; 1 Aubrie; 1 Pie Chart
Steve; 2
Source: http://viz.wtf/
post/155727224217/the-
Sophie; 2 Azul; 8 authors-explain-furthermore-
we-present-the

Sloane; 2

Rose; 3 Bryan; 1

Molly; 1 Efrain; 3
Assistant
Marisol; 2
Jan; 3
Researcher
Luna; 3 Kinley; 2
Lia; 3 Leroy; 1 Administrator

9
Assistant
EXHIBIT 4-17
8 More Clear Rank-
Researcher
7 Ordered Bar Chart
Administrator
6
Attacks

5
4
3
2
1
0
Azul
Alanna
Efrain
Jan
Lia
Luna
Rose
Yareli
Kinley
Marisol
Sloane
Sophie
Steve
Zachery
Ann
Aubrie
Bryan
Leroy
Molly
Willie

ISTUDY
198 Chapter 4 Communicating Results and Visualizations

EXHIBIT 4-18 30
Bar Chart ­Emphasizing
Attacks by Job 25
Function
20

Attacks
15

10

0
Assistant Researcher Administrator

EXHIBIT 4-19 50
Stacked Bar Chart 45
Emphasizing
­Proportion of Attacks 40
by Job Function 35 Assistant
30
Attacks

25
20
15 Researcher
10
5 Administrator
0

PROGRESS CHECK
4. The following two charts represent the exact same data—the quantity of beer
sold on each day in the Sláinte Sales Subset dataset. Which chart is more appro-
priate for working with dates, the column chart or the line chart? Which do you
prefer? Why?
a.

Microsoft Excel 2016

ISTUDY
Chapter 4 Communicating Results and Visualizations 199

b.

Microsoft Excel 2016

5. The same dataset was consolidated into quarters. This chart was made with the
chart wizard feature in Excel, which made the creation of it easy, but something
went wrong. Can you identify what went wrong with this chart?

Microsoft Excel 2016

6. The following four charts represent the exact same data quantity of each beer
sold. Which do you prefer, the line chart or the column chart? Whichever you
chose, line or column, which of the pair do you think is the easiest to digest?
a.

Microsoft Excel 2016

ISTUDY
200 Chapter 4 Communicating Results and Visualizations

b.

Microsoft Excel 2016

c.

Microsoft Excel 2016

d.

Microsoft Excel 2016

LO 4-4 FURTHER REFINING YOUR CHART TO


Refine your chart COMMUNICATE BETTER
to communicate After identifying the purpose of your visualization and which type of visual will be most
efficiently and effective in communicating your results, you will need to further refine your chart to pick
effectively. the right data scale, color, and format.

ISTUDY
Chapter 4 Communicating Results and Visualizations 201

Data Scale and Increments


As tools such as Excel and Tableau become more intuitive and more powerful, considering
your data scale and increments is less of a concern because both tools will generally come
up with scales and increments that make sense for your dataset. With that being said, there
are still four main questions to consider when creating your data scale and increments:
1. How much data do you need to share in the visual to avoid being misleading, yet also
avoid being distracting? (For example, do you need to display the past 4 years, or will
the past two quarters suffice?) When you consider leaving out some data, is it to show
only the insights that are meaningful, or is it an attempt to skew the data or to hide
poor performance? Be careful to not hide data that are meaningful just because they
don’t align with your expectations.
2. If your data contain outliers, should they be displayed, or will they distort your scale
to the extent that you can leave them out? If the purpose of your chart is to call atten-
tion to the outliers, then they need to remain (and you need to ensure that they are not
errors, but this should have been done in step 2 of the IMPACT model when you mas-
tered the data). If the purpose of your chart is to display the middle pack of the data,
the outliers may not be relevant to the insights, and they could be left out.
3. Other than determining how much data you need to share, what scale should you place
those data on? Typically, charts should begin with a baseline of 0, but if 0 is meaning-
less to your dataset, you could find a different baseline that makes sense. Be careful to
not overexaggerate the height or the baseline so that your trendline or bar chart is over-
or underemphasized; your trendline should take up two-thirds of the chart. Once you
decide on a data scale, the increments for your data scale should be “natural” such as
1s, 2s, 5s, 100s, and so on (e.g., not 3s or 0.02s).
4. Do you need to provide context or reference points to make the scale meaningful? For
example, if you were provided with a stock price of $100, would you immediately be
able to tell if that is a high number or a low number? Not necessarily; without context
of the company’s stock price over time, the company’s industry and its competitors’
stock prices, or some other piece of context, certain numbers are not altogether useful.

Color
Similar to how Excel and Tableau have become stronger tools at picking appropriate data
scales and increments, both Excel and Tableau will have default color themes when you
begin creating your data visualizations. You may choose to customize the theme. However,
if you do, here are a few points to consider:
• When should you use multiple colors? Using multiple colors to differentiate types
of data is effective. Using a different color to highlight a focal point is also effective.
However, don’t use multiple colors to represent the same type of data. Be careful to not
use color to make the chart look pretty—the point of the visualization is to showcase
insights from your data, not to make art.
• We are trained to understand the differences among red, yellow, and green, with red
meaning something negative that we would want to “stop” and green being something
positive that we would want to “continue,” just like with traffic lights. For that reason,
use red and green only for those purposes. Using red to show something positive or
green to show something negative is counterintuitive and will make your chart harder to
understand. You may also want to consider a color-blind audience. If you are concerned
that someone reading your visuals may be color blind, avoid a red/green scale and
consider using orange/blue. Tableau has begun defaulting to orange/blue color scales
instead of red/green for this reason.

ISTUDY
202 Chapter 4 Communicating Results and Visualizations

• Once your chart has been created, convert it to grayscale to ensure that the contrast still
exists—this is both to ensure your color-blind audience can interpret your visuals and
also to ensure that the contrast, in general, is stark enough with the color palette you
have chosen.

PROGRESS CHECK
7. Often, external consultants will use a firm’s color scheme for a data visualization
or will use a firm’s logo for points on a scatter plot. While this might be a great
approach to support a corporate culture, it is often not the most effective way to
create a chart. Why would these methods harm a chart’s effectiveness?

LO 4-5 COMMUNICATION: MORE THAN VISUALS—


Communicate your USING WORDS TO PROVIDE INSIGHTS
results in a written
As a student, the majority of the writing you do is for your professors. You likely write
report.
emails to your professors, which should carry a respectful tone, or essays for your Comp 1
or literature professors, where you may have been encouraged to use descriptive language
and an elevated tone; you might even have had the opportunity to write a business brief
or report for your business professors. All the while, though, you were still aware that you
were writing for a professor. When you enter the professional world, your writing will need
to take on a different tone. If you are accustomed to writing with an academic tone, transi-
tioning to writing for your colleagues in a business setting requires some practice. As Justin
Zobel says in Writing for Computer Science, “good style for science is ultimately, nothing
more than writing that is easy to understand. [It should be] clear, unambiguous, correct,
interesting, and direct.”5 As an author team, we have tremendous respect for literature and
the different styles of writing to be found, but for communicating your results of a data
analysis project, you need to write directly to your audience, with only the necessary points
included, and as little descriptive style as possible. The point is, get to the point.

Content and Organization


Each step of the IMPACT model should be communicated in your write-up, as noted here:
I: Explain what was being researched. Even if your audience is the people who
requested the research, you should still restate the purpose of the project. Include
any relevant history as well. If your project is part of a larger program or if it’s a
continued effort to explain an issue or help a decision come to fruition, then include
the background.
M: Depending on your audience, you may not cover too much of what your process
was in the “master the data” step of the IMPACT model, but an overview of the
data source and which pieces of data are included in the analysis should be present.
If your audience is technical and interested, you may go into detail on your ETL
process, but it is more likely that you will leave out that piece.
P and A: Similar to how you write about mastering the data, you may not need to
include a thorough description of your test plan or your process for refining your re-
sults depending on what your audience is interested in and what they need to know,

5
Justin Zobel, Writing for Computer Science (Singapore: Springer-Verlag, 1997).

ISTUDY
Chapter 4 Communicating Results and Visualizations 203

but including an overview of the type of analysis performed and any limitations that
you encountered will be important to include.
C: If you are including a data visualization with your write-up, you need to explain
how to use the visual. If there are certain aspects that you expect to stand out from
the analysis and the accompanying visual, you should describe what those compo-
nents are—the visual should speak for itself, but the write-up can provide confirma-
tion that the important pieces are gleaned.
T: Discuss what’s next in your analysis. Will the visual or the report result in a week-
ly or quarterly report? What trends or outliers should be paid attention to over time?

Audience and Tone


Carefully considering your audience is critical to ensuring your communication is effective. If
you have three messages to write—one letting your parents know that you are coming home
this weekend and you’ll need to do laundry, one to your professor letting them know that
you will miss class on Friday, and one to your best friend asking if they want to join you for
­Chipotle—efficiency would suggest that you type it all into one email and click send. That would
definitely be the quickest way to get out the message. But is it a good idea? Certainly not. Your
mom does not need to know that you’re not going to class on Friday, and you probably don’t
want your professor to show up at Chipotle to have lunch with you and your friend. Instead of
sending the same message to all three people, you tailor the delivery—that is, you consider the
audience. You include all of the information that they need to know and nothing else.
You should do the same thing when crafting your communication regarding your data
analysis. If you have several different people to communicate results to, you may consider
crafting several different versions: one that contains all of the extraction, transformation,
and loading (ETL) details for the programmers and database administrators, one that is
light on ETL but heavy on interpretation of the visual and results for your managers, and
so on. Consider the knowledge and skill of your audience—don’t talk down to them, but
don’t overwhelm a nontechnical crowd with technical jargon. Explain the basics when you
should, and don’t when you shouldn’t.
An additional piece of communication to consider is the vehicle for communication.
We have a myriad of options available to us for communicating: email, phone calls, Skype,
instant messaging, printed reports, even face-to-face conversations that can be either infor-
mal or formal presentations in meetings. When crafting your communication, consider the
best way to provide the information to your intended audience.
Is the concept difficult to understand? A written report will probably not suffice; plan to
supplement your written material with a sit-down conversation or a phone call to explain
the details and answer questions.
Is the topic an answer to a question and fairly simple to understand? An emailed response
summarizing the visualization and results will likely suffice.
How does the person you are sending the report to communicate? Consider the profes-
sional culture of your organization. It may be commonplace to communicate casually using
abbreviations and jargon in your workplace, but if that’s not the way your workplace oper-
ates or even if it’s not the way that the recipient of your message communicates, take the
time to refine your message and mirror the norms of the organization and the recipients.
Is the report going to be updated and sent out at regular intervals (daily, weekly,
monthly)? If so, keep a consistent template so that it is easy for the recipients to identify the
information they seek on a regular basis.
There are, of course, many more concepts to consider that will be unique to each mes-
sage that you craft. Take the time to always consider your audience, their communication

ISTUDY
204 Chapter 4 Communicating Results and Visualizations

style, and what they need from the communication—and provide it, via the right message,
the right tone, and the right vehicle.

Revising
Just as you addressed and refined your results in the fourth step of the IMPACT model,
you should refine your writing. Until you get plenty of practice (and even once you con-
sider yourself an expert), you should ask other people to read through your writing to make
sure that you are communicating clearly. Justin Zobel suggests that revising your writing
requires you to “be egoless—ready to dislike anything you have previously written. . . . If
someone dislikes something you have written, remember that it is the readers you need to
please, not yourself.”6 Always placing your audience as the focus of your writing will help
you maintain an appropriate tone, provide the right content, and avoid too much detail.

PROGRESS CHECK
Progress Checks 5 and 6 display different charts depicting the quantity of beer sold on
each day in the Sláinte Sales Subset dataset. If you had created those visuals, starting
with the data request form and the ETL process all the way through data analysis, how
would you tailor the written report for the following two roles?
8. For the CEO of the brewery who is interested in how well the different products
are performing.
9. For the programmers who will be in charge of creating a report that contains the
same information that needs to be sent to the CEO on a monthly basis.

6
Justin Zobel.

Summary
■ This chapter focused on the fifth step of the IMPACT model, or the “C,” on how to com-
municate the results of your data analysis projects. Communication can be done through
a variety of data visualizations and written reports, depending on your audience and the
data you are exhibiting. (LO 4-1)
■ In order to select the right chart, you must first determine the purpose of your data visu-
alization. This can be done by answering two key questions:
◦ Are you explaining the results of a previously done analysis, or are you exploring the
data through the visualization? (Is your purpose declarative or exploratory?)
◦ What type of data are being visualized (conceptual [qualitative] data or data-driven
[quantitative] data)? (LO 4-3)
■ The differences between each type of data (declarative and exploratory, qualitative and
quantitative) are explained, as well as how each data type affects both the tool you’re
likely to use (generally either Excel or Tableau) and the chart you should create. (LO 4-3)
■ After selecting the right chart based on your purpose and data type, your chart will need
to be further refined. Selecting the appropriate data scale, scale increments, and color for
your visualization is explained through the answers to the following questions: (LO 4-4)
◦ How much data do you need to share in the visual to avoid being misleading, yet also
avoid being distracting?

ISTUDY
◦ If your data contain outliers, should they be displayed, or will they distort your scale
to the extent that you can leave them out?
◦ Other than how much data you need to share, what scale should you place those data on?
◦ Do you need to provide context or reference points to make the scale meaningful?
◦ When should you use multiple colors?
■ Finally, this chapter discusses how to provide a written report to describe your data analysis
project. Each step of the IMPACT model should be communicated in your write-up, and the
report should be tailored to the specific audience to whom it is being delivered. (LO 4-5)

Key Words
continuous data (188) One way to categorize quantitative data, as opposed to discrete data. Continuous
data can take on any value within a range. An example of continuous data is height.
declarative visualizations (188) Made when the aim of your project is to “declare” or present your
findings to an audience. Charts that are declarative are typically made after the data analysis has been
completed and are meant to exhibit what was found in the analysis steps.
discrete data (188) One way to categorize quantitative data, as opposed to continuous data. Discrete
data are represented by whole numbers. An example of discrete data is points in a basketball game.
exploratory visualizations (189) Made when the lines between steps “P” (perform test plan), “A”
(address and refine results), and “C” (communicate results) are not as clearly divided as they are in a
declarative visualization project. Often when you are exploring the data with visualizations, you are per-
forming the test plan directly in visualization software such as Tableau instead of creating the chart after
the analysis has been done.
interval data (187) The third most sophisticated type of data on the scale of nominal, ordinal, interval,
and ratio; a type of quantitative data. Interval data can be counted and grouped like qualitative data, and
the differences between each data point are meaningful. However, interval data do not have a meaningful
0. In interval data, 0 does not mean “the absence of” but is simply another number. An example of interval
data is the Fahrenheit scale of temperature measurement.
nominal data (187) The least sophisticated type of data on the scale of nominal, ordinal, interval, and
ratio; a type of qualitative data. The only thing you can do with nominal data is count, group, and take a
proportion. Examples of nominal data are hair color, gender, and ethnic groups.
normal distribution (188) A type of distribution in which the median, mean, and mode are all equal,
so half of all the observations fall below the mean and the other half fall above the mean. This phenom-
enon is naturally occurring in many datasets in our world, such as SAT scores and heights and weights of
newborn babies. When datasets follow a normal distribution, they can be standardized and compared for
easier analysis.
ordinal data (187) The second most sophisticated type of data on the scale of nominal, ordinal, interval,
and ratio; a type of qualitative data. Ordinal can be counted and categorized like nominal data and the cat-
egories can also be ranked. Examples of ordinal data include gold, silver, and bronze medals.
proportion (187) The primary statistic used with quantitative data. Proportion is calculated by counting the
number of items in a particular category, then dividing that number by the total number of observations.
qualitative data (186) Categorical data. All you can do with these data is count and group, and in some
cases, you can rank the data. Qualitative data can be further defined in two ways: nominal data and ordinal
data. There are not as many options for charting qualitative data because they are not as sophisticated as
quantitative data.
quantitative data (187) More complex than qualitative data. Quantitative data can be further defined
in two ways: interval and ratio. In all quantitative data, the intervals between data points are meaningful,
allowing the data to be not just counted, grouped, and ranked, but also to have more complex operations
performed on them such as mean, median, and standard deviation.
ratio data (187) The most sophisticated type of data on the scale of nominal, ordinal, interval, and
ratio; a type of quantitative data. They can be counted and grouped just like qualitative data, and the

ISTUDY
differences between each data point are meaningful like with interval data. Additionally, ratio data have a
­meaningful 0. In other words, once a dataset approaches 0, 0 means “the absence of.” An example of ratio
data is currency.
standard normal distribution (188) A special case of the normal distribution used for standardizing
data. The standard normal distribution has 0 for its mean (and thus, for its mode and median, as well),
and 1 for its standard deviation.
standardization (188) The method used for comparing two datasets that follow the normal distribution.
By using a formula, every normal distribution can be transformed into the standard normal distribution. If
you standardize both datasets, you can place both distributions on the same chart and more swiftly come
to your insights.

ANSWERS TO PROGRESS CHECKS


1. Certainly, answers will vary given our own individual experiences. But we can note that
complex topics can be explained and understood by linking them to categorizations or
pictures, such as the food pyramid.
2. Answers will vary.
3. a. Qualitative ordinal
b. Quantitative (ratio data)
c. Qualitative nominal
d. Quantitative (interval data)
4. While this question does ask for your preference, it is likely that you prefer image b
because time series data are continuous and can be well represented with a line chart
instead of bars.
5. Notice that the quarters are out of order (1, 2, then 4); this looks like quarter 3 has been
skipped, but quarter 4 is actually the last quarter of 2019 instead of the last quarter of
2020, while quarters 1 and 2 are in 2020. Excel defaulted to simply ordering the quarters
numerically instead of recognizing the order of the years in the underlying data. You want
to be careful to avoid this sort of issue by paying careful attention to the charts, ordering,
and scales that are automatically created through Excel (and other tools) wizards.
6. Answers will vary. Possible answers include the following: Quantity of beer sold is a dis-
crete value, so it is likely better modeled with a bar chart than a line chart. Between the
two line charts, the second one is easier to interpret because it is in order of highest sales
to lowest. Between the two bar charts, it depends on what is important to convey to your
audience—are the numbers critical? If so, the second chart is better. Is it most important to
simply show which beers are performing better than others? If so, the first chart is better.
There is no reason to provide more data than necessary because they will just clutter up
the visual.
7. Color in a chart should be used purposefully; it is possible that a firm’s color scheme may
be counterproductive to interpreting the chart. The icons as points in a scatter plot might
be distracting, which could make it take longer for a reader to gain insights from the chart.
8. Answers will vary. Possible answers include the following: Explain to the CEO how to read
the visual, call out the important insights in the chart, tell the range of data that is included
(is it one quarter, one year, all time?).
9. Answers will vary. Possible answers include the following: Explain the ETL process, exactly
what data are extracted to create the visual, which tool the data were loaded into, and
how the data were analyzed. Explain the mechanics of the visual. The particular insights
of this visual are not pertinent to the programmer because the insights will potentially
change over time. The mechanics of creating the report are most important.

206

ISTUDY
Multiple Choice Questions
®

1. (LO 4-1) Anscombe’s Quartet suggests that:


a. statistics should be used instead of visualizations.
b. visualizations should be used instead of statistics.
c. visualizations should be used in tandem with statistics.
2. (LO 4-2) In the late 1960s, Ed Altman developed a model to predict if a company was
at severe risk of going bankrupt. He called his statistic Altman’s Z-score, now a widely
used score in finance. Based on the name of the statistic, which statistical distribution
would you guess this came from?
a. Normal distribution
b. Poisson distribution
c. Standardized normal distribution
d. Uniform distribution
3. (LO 4-5) Justin Zobel suggests that revising your writing requires you to “be egoless—
ready to dislike anything you have previously written,” suggesting that it is you
need to please.
a. yourself
b. the reader
c. the customer
d. your boss
4. (LO 4-2) Which of the following is not a typical example of nominal data?
a. Gender
b. SAT scores
c. Hair color
d. Ethnic group
5. (LO 4-2) The Fahrenheit scale of temperature measurement would best be described
as an example of:
a. interval data.
b. discrete data.
c. nominal data.
d. continuous data.
6. (LO 4-2) data would be considered the least sophisticated type of data.
a. Ratio
b. Interval
c. Ordinal
d. Nominal
7. (LO 4-2) data would be considered the most sophisticated type of data.
a. Ratio
b. Interval
c. Ordinal
d. Nominal
8. (LO 4-3) Line charts are not recommended for what type of data?
a. Normalized data
b. Qualitative data
c. Continuous data
d. Trend lines

ISTUDY
9. (LO 4-3) Exhibit 4-12 gives chart suggestions for what data you’d like to portray. Those
options include all of the following except:
a. relationship between variables.
b. geographic data.
c. outlier detection.
d. normal distribution curves.
10. (LO 4-3) What is the most appropriate chart when showing a relationship between two
variables (according to Exhibit 4-12)?
a. Scatter chart
b. Bar chart
c. Pie graph
d. Histogram

Discussion and Analysis


®

1. (LO 4-2) Explain Exhibit 4-4 and why these four dimensions are helpful in describing
information to be communicated. Exhibit 4-4 lists conceptual and data-driven as being
on two ends of the continuum. Does that make sense, or can you think of a better way
to organize and differentiate the different chart types?
2. (LO 4-3) According to Exhibit 4-12, which is the best chart for showing a distribution of
a single variable, like height? How about hair color? Major in college?
3. (LO 4-3) Box and whisker plots (or box plots) are particularly adept at showing extreme
observations and outliers. In what situations would it be important to communicate these
data to a reader? Any particular accounts on the balance sheet or income statement?
4. (LO 4-3) Based on the data from datavizcatalogue.com, a line graph is best at showing
trends, relationships, compositions, or distributions?
5. (LO 4-3) Based on the data from datavizcatalogue.com, what are some major flaws of
using word clouds to communicate the frequency of words in a document?
6. (LO 4-3) Based on the data from datavizcatalogue.com, how does a box and whisker
plot show if the data are symmetrical?
7. (LO 4-3) What would be the best chart to use to illustrate earnings per share for one
company over the past 5 years?
8. (LO 4-3) The text mentions, “If your data analysis project is more declarative than
exploratory, it is more likely that you will perform your data visualization to communi-
cate results in Excel.” In your opinion, why is this true?
9. (LO 4-3) According to the text and your own experience, why is Tableau ideal for explor-
atory data analysis?

Problems
®

1. (LO 4-3) Match the chart type to whether it is used primarily to communicate qualitative
or quantitative results:
Chart Type Quantitative or Qualitative?
1. Pie chart
2. Box and whisker plot
3. Word cloud
4. Symbol map
5. Scatter plot
6. Line chart

208

ISTUDY
2. (LO 4-3) Match the desired visualization for quantitative data to the following chart
types:
• Line charts
• Bar charts
• Box and whisker plots
• Scatter plots
• Filled geographic maps

Desired Visualization Chart Type

1. Useful for showing quartiles, medians, and outliers

2. Correlation between two variables

3. Distribution of sales across states or countries

4. Visualize the line of best fit

5. Data trends for net income over the past eight quarters

6. Data trends for stock price over the past 5 years

3. (LO 4-2) Match the data examples to one of the following data types:
• Interval data
• Nominal data
• Ordinal data
• Ratio data
• Structured data
• Unstructured data

Data Example Data Type

1. GMAT Score

2. Total Sales

3. Blue Ribbon, Yellow Ribbon, Red Ribbon

4. Company Use of Cash Basis vs. Accrual Basis

5. Depreciation Method (Declining Balance, Straight-Line, etc.)

6. Management Discussion and Analysis

7. Income Statement

8. Inventory Method (FIFO, LIFO, etc.)

9. Blogs

10. Total Liabilities

4. (LO 4-2) Match the definition to one of the following data terms:
• Declarative visualization
• Exploratory visualization
• Interval data
• Nominal data

ISTUDY
• Ordinal data
• Ratio data

Data Definition Data Term

1. M
 ethod used to communicate the results after the data analy-
sis has been completed

2. Categorical data that cannot be ranked

3. Categorical data with natural, ordered categories

4. N
 umerical data with an equal and definitive ratio between each
data point and the value of 0 means “the absence of”

5. V
 isualization used to determine best method of analysis, usu-
ally without predefined statistical models

6. Numerical data measured along with a scale

5. (LO 4-2, LO 4-3) Identify the order sequence from least sophisticated (1) to most sophis-
ticated (4) data type.

Steps of the Data Reduction Approach Sequence Order (1 to 4)

1. Interval data

2. Ordinal data

3. Nominal data

4. Ratio data

6. (LO 4-1) Analysis: Why was the heat map associated with the opening vignette regard-
ing the 1854 cholera epidemic effective? Now that we have more sophisticated tools
and methods for visualizing data, what else could have been used to communicate this,
and would it have been more or less effective in your opinion?
7. (LO 4-1) Analysis: Evaluate the use of color in the graphic associated with the opening
vignette regarding drug overdose deaths across America. Would you consider its use
effective or ineffective? Why? How is this more or less effective than communicating the
same data in a bar chart?
8. (LO 4-3) According to Exhibit 4-12 and related chapter discussion, which is the best
chart for comparisons of earnings per share over many periods? How about for only a
few periods?
• What is the best chart category? Conceptual or data-driven?
• What is the best chart subcategory?
• What is the best chart type?
9. (LO 4-3) According to Exhibit 4-12 and related chapter discussion, which is the best
chart for static composition of a data item of the Accounts Receivable balance at the
end of the year? Which is best for showing a change in composition of Accounts Receiv-
able over two or more periods?
• What is the best chart category? Conceptual or data-driven?
• What is the best chart subcategory?
• What is the best chart type?

210

ISTUDY
10. (LO 4-3, LO 4-4) The Big Four accounting firms (Deloitte, EY, KPMG, and PwC) dominate
the audit and tax market in the United States. What chart would you use to show which
accounting firm dominates in each state in terms of audit revenues?

Type of Chart Appropriate to Compare Audit Revenues by State?

1. Area chart

2. Line chart

3. Column chart

4. Histogram

5. Bubble chart

6. Stacked column chart

7. Stacked area chart

8. Pie chart

9. Waterfall chart

10. Symbol chart

11. (LO 4-3, LO 4-4) Datavizcatalogue.com lists seven types of maps in its listing of charts.
Which one would you use to assess geographic customer concentration by number?
Analysis: How could you show if some customers buy more than other customers on
such a map? Would you use the same chart or a different one?

Type of Map Appropriate for Geographic Customer Concentration?

1. Tree map

2. Choropleth map

3. Flow map

4. Connection map

5. Bubble map

6. Heat map

7. Dot map

12. (LO 4-4) Analysis: In your opinion, is the primary reason that analysts use inappropri-
ate scales for their charts due to an error related to naiveté (or ineffective training),
or are the inappropriate scales used so the analyst can sway the audience one way or
the other?

ISTUDY
LABS ®

Lab 4-1 Visualize Declarative Data—Sláinte


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As you are analyzing the Sláinte brewery data, your supervisor has asked
you to compile a series of visualizations that can quickly describe the sales that have taken
place in the first months of the year, including which products are selling best and where
those sales are taking place.
When working with a data analysis project that is declarative in nature, the data can be
summarized in various tables and visualizations as a means to communicate results.
Data: Lab 4-1 Slainte Dataset.zip - 106KB Zip / 114KB Excel

Lab 4-1 Example Output


By the end of this lab, you will create a dashboard summarizing declarative data. While your
results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 4-1M Example Declarative Dashboard in Microsoft Power BI Desktop

212

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 4-1T Example Declarative Dashboard in Tableau Desktop

Lab 4-1 Part 1 Total Sales Revenue by Product and


Location
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-1 [Your name] [Your email address].docx.
The first set of analyses will focus on sales dollars in terms of the total sales price
assigned to each sales order total. By tracking the total sales dollars, you will focus on the
best-selling products and locations.

Microsoft | Power BI Desktop

1. Create a new file in Power BI Desktop.


2. Click Home > Get Data > Excel.
3. Locate the Lab 4-1 Slainte Dataset.xlsx file on your computer and click
Open.
4. In the Navigator window, check the following tables and click Load:
a. Customer_Master_Listing, Finished_Goods_Products, Sales_Order,
Sales_­Order_Lines

ISTUDY
Rev. Confirming Pages

5. Each sales order line item shows the individual quantity ordered and the
product sale price per item. Before you analyze the sales data, you will need
to calculate each line subtotal:
a. Click Data in the toolbar on the left to show your data tables.
b. Click the Sales_Order_Lines table in the list on the right.
c. In the Home tab, click New Column.
d. For the column formula, enter: Line _Subtotal = Sales_Order_
Lines[Product_Sale_Price] * Sales_Order_Lines[Sales_Order_­
Quantity_Sold].
e. Click Report in the toolbar on the left to return to your report.
6. Create a visualization showing the total sales revenue by product name
(descriptive analytics):
a. In the Visualizations pane, click Stacked Bar Chart.
b. Drag Finished_Goods_Product.Product_Description to the Y-axis box.
c. Drag Sales_Order_Lines.Line_Subtotal to the X-axis box.
d. Resize the right side of the chart so it fills half of the page.
e. Click the Format visual (paintbrush) button, and click General > Title >
On.
f. Name the chart Sales Revenue by Product.
g. Take a screenshot (label it 4-1MA).
7. Create a visualization showing the total sales revenue by state (descriptive
analytics):
a. Click anywhere on the page outside of your first visualization, then in
the Visualization pane, click Filled Map.
b. Drag Customer_Master_Listing.Customer_State to the Location box.
c. Drag Sales_Order_Lines.Line_Subtotal to the Tooltips box.
d. Click the Format visual (paintbrush) button, and click Fill Colors.
e. Below Default Color, click the fx button.
f. Under Based on Field, choose Sales_Order_Lines.Line_Subtotal and
click OK.
g. Click the Format visual (paintbrush) button, and click General > Title > On.
h. Name the chart Sales Revenue by State.
i. Take a screenshot (label it 4-1MB).
8. After you answer the lab questions, save your file as Lab 4-1 Slainte Sales
Dashboard.pbix, and continue to Lab 4-1 Part 2.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop.


2. Click Connect > To a File > Microsoft Excel.
3. Locate the Lab 4-1 Slainte Dataset.xlsx file on your computer and click
Open.

214

ISTUDY ric44907_ch04_180-243.indd 214 02/20/23 12:49 PM


4. In the Data Source window, double-click the following tables to add them
to your model. If prompted, verify the correct primary key/foreign key pairs
and close the Edit Relationship window:
a. Sales_Order, Customer_Master_Listing, Sales_Order_Lines, Finished_
Goods_Products
5. Each sales order line item shows the individual quantity ordered and the
product sale price per item. Before you analyze the sales data, you will need
to calculate each line subtotal:
a. Go to Sheet 1 and rename it Sales Revenue by Product.
b. Click Analysis > Create Calculated Field. . .
c. For the formula name (top box), enter: Line Subtotal.
d. For the formula (bottom box), enter: [Product_Sale_Price] *
[Sales_Order_Quantity_Sold].
e. Click OK to return to your sheet.
6. Create a visualization showing the total sales revenue by product name:
a. Drag Finished_Goods_Product.Product Description to the Rows shelf.
b. Drag Sales_Order_Lines.Line Subtotal to the Columns shelf.
c. In the toolbar, click the Sort Descending button to show the products by
sales revenue with the largest revenue at the top of the chart.
d. Take a screenshot (label it 4-1TA).
7. Create a visualization showing the total sales revenue by state:
a. Click Worksheet > New Worksheet.
b. Drag Longitude (generated) to the Columns shelf.
c. Drag Latitude (generated) to the Rows shelf. A map will appear.
d. Drag Sales_Order_Lines.Line Subtotal to the Color button in the
Marks pane.
e. Drag Customer_Master_Listing.Customer State to the Detail button in
the Marks pane. The map should now show a gradient of color with the
highest sales revenue states appearing in dark blue.
f. Right-click the Sheet 2 tab and rename it Sales Revenue by State.
g. Take a screenshot (label it 4-1TB).
8. After you answer the lab questions, save your worksheet as Lab 4-1 Slainte
Sales Dashboard.twb, and continue to Lab 4-1 Part 2.

Lab 4-1 Part 1 Objective Questions (LO 4-2)


OQ1. What is the top-selling product by total revenue overall?
OQ2. How much total revenue did the top-selling product generate overall?
OQ3. Which state had the highest total revenue overall?
OQ4. How much total revenue did that state generate overall?

Lab 4-1 Part 1 Analysis Questions (LO 4-2, 4-3)


AQ1. How do size and color make certain data points stand out relative to other
data points?

ISTUDY
Rev. Confirming Pages

AQ2. What other visualizations might you use to describe sales revenue?
AQ3. What other ways might you slice sales revenue to help management understand
performance?

Lab 4-1 Part 2 Total Sales Volume by Month and Target


This second analysis will focus on sales volume or number of units sold for each product by
month and illustrate how to compare values to a target or goal.

Microsoft | Power BI Desktop

1. Open your Lab 4-1 Slainte Sales Dashboard.pbix file from Lab 4-1 Part 1 if
you closed it.
2. Create a visualization showing the total sales volume by month in a table:
a. Click anywhere on the page outside of your first visualization; then in
the Visualizations pane, click Matrix and drag the object to the top-right
corner of the page. Resize it by dragging the bottom to match the chart
next to it, if needed.
b. Drag Finished_Goods_Product.Product_Description to the Rows box.
c. Drag Sales_Order.Sales_Order_Date to the Columns box.
1. Click the X next to Quarter and Day to remove them from the date
heirarchy.
d. Drag Sales_Order_Lines.Sales_Order_Quantity_Sold to the Values box.
e. In the toolbar below your matrix, click Expand all down one level in the
hierarchy button (the button looks like an upside-down fork) to reveal
the sales by month.
f. Click the Format Visual (paintbrush) button and click Cell elements.
g. Turn on Data bars. The values in the table will now show proportional
bars representing the data values.
h. In the toolbar below your matrix, click More Options (. . .) and click
Sort by > Sales_Order_Quantity_Sold.
i. Click the Format Visual (paintbrush) button, and click General > Title
> On.
j. Name the chart Sales Volume by Month.
k. Take a screenshot (label it 4-1MC).
3. Create a visualization showing the total sales volume by target:
a. Click anywhere on the page outside of your first visualization; then in
the Visualization pane, click Gauge. Drag it to the bottom-right corner
of your page and resize it to fill the corner.
b. Drag Sales_Order_Lines.Sales_Order_Quantity_Sold to the Value box.
c. To add a target measure, click Home > New Measure.
1. In the formula box, type: Sales_Volume_Target = 10000 and press
Enter. The new Sales_Volume_Target measure will appear in the list
of attributes in the Customer table, though the exact location does
not matter.

216

ISTUDY ric44907_ch04_180-243.indd 216 02/20/23 12:52 PM


Rev. Confirming Pages

d. Drag your new Sales_Volume_Target measure to the Target Value box. 


A line appears on your gauge.
e. Click the Format visual (paintbrush) button, and click General > Title > On.
f. Name the chart Sales Volume Target.
g. Take a screenshot (label it 4-1MD).
4. Optional: Create a Q&A box to interrogate your data with natural language
questions:
a. In the Visualization pane, click Q&A.
b. Type in a question like: What is the sales order quantity sold by business
name?
c. Explore other questions.
5. After you answer the lab questions, you may close Power BI. Save your work-
book as 4-1 Slainte Sales Dashboard.pbix.

Tableau | Desktop

1. Open your Lab 4-1 Slainte Sales Dashboard.twb file from Lab 4-1 Part 1 if
you closed it.
2. Create a visualization showing the total sales revenue by product name:
a. Click Worksheet > New Worksheet.
b. Drag Finished_Goods_Product.Product Description to the rows shelf.
c. Drag Sales_Order_Lines.Sales Order Date to the columns shelf. Click
the + next to the YEAR(Sales Order Date) pill to expand the date to
show quarter and month. Then remove the Quarter pill.
d. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Columns
shelf.
e. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Label button
in the Marks pane.
f. To show grand totals, click the Analytics tab in the panel on the left.
g. Drag Totals to your table and choose Column Grand Totals.
h. In the toolbar, click the Sort Descending button to show the products by
sales volume with the largest volume at the top of the chart.
i. Right-click the Sheet 3 tab and rename it “Sales Volume by Month”.
j. Take a screenshot (label it 4-1TC).
3. Create a visualization showing the total sales volume by target:
a. Click Worksheet > New Worksheet.
b. Drag Sales_Order_Lines.Sales Order Quantity Sold to the Columns
shelf.
c. To add a target measure, click the down arrow above the Tables pane on
the left in the Data tab and choose Create Parameter. . .
d. Name the parameter Sales Volume Target, set the Current value to
10000, and click OK.

ISTUDY ric44907_ch04_180-243.indd 217 02/20/23 12:54 PM


e. Click the Analytics tab in the pane on the left side of the screen.
f. Drag Reference Line onto your bar and choose Table.
g. In the Edit Reference Line window, set the value to Sales Volume Target
(Parameter) and click OK. The target reference line appears on your bar.
h. Right-click the Sheet 4 tab and rename it Sales Volume by Target.
4. Finally, combine all of your visuals into a single dashboard:
a. Click Dashboard > New Dashboard.
b. In the Size option, change Fixed Size to Automatic.
c. Drag each of your four sheets into each corner of the dashboard.
d. Click the Sales Volume By Month table on your dashboard and from the
mini toolbar that appears on the right, click Use as Filter (funnel).
e. Take a screenshot (label it 4-1TD).
5. After you answer the lab questions, you may close Tableau Desktop. Save
your workbook as 4-1 Slainte Sales Dashboard.twb.

Lab 4-1 Part 2 Objective Questions (LO 4-2)


OQ1. What is the total sales volume for January 2020?
OQ2. By how much did Sláinte miss or exceed its overall sales goal of 10,000 units?
Hint: Hover over Sales Volume By Target to see specific values.
OQ3. Which products sold over 300 units in February 2020?
OQ4. Click the year 2020 in the Sales Volume By Month table. By how much did
Sláinte miss or exceed its sales goal of 10,000 units for the year so far?

Lab 4-1 Part 2 Analysis Questions (LO 4-2, 4-3)


AQ1. What do you notice about sales revenue and sales volume based on your dashboard?
AQ2. What are some additional ways you could slice sales volume that would be
­useful to management as they evaluate performance?
AQ3. Which additional visualizations might be useful for showing declarative data in
this instance?

Lab 4-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 4-2 P
 erform Exploratory Analysis and Create
Dashboards—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As you are analyzing the Sláinte brewery data, your supervisor has asked
you to compile a series of visualizations that will help them explore the sales information
that have taken place in the past several months of the year to help predict the sales and
relationships with customers.

218

ISTUDY
When working with a data analysis project that is exploratory in nature, the data can be
set up with simple visuals that allow you to drill down and see more granular data.
Data: Lab 4-2 Slainte Dataset.zip - 106KB Zip / 114KB Excel

Lab 4-2 Example Output


By the end of this lab, you will create a regression analysis. While your results will include
different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 4-2M Example Exploratory Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 4-2T Example Exploratory Dashboard in Tableau Desktop

ISTUDY
Rev. Confirming Pages

Lab 4-2 Total Sales by Customer


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-2 [Your name] [Your email address].docx.
The first set of analyses will focus on sales by customer. By tracking the total sales
­revenue you will focus on the customers that form important relationships.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop.


2. Click Home > Get Data > Excel.
3. Locate the Lab 4-2 Slainte Dataset.xlsx file on your computer and click
Open.
4. In the Navigator window, check the following tables and click Load:
a. Customer_Master_Listing, Finished_Goods_Products, Sales_Order,
Sales_Order_Lines
5. Each sales order line item shows the individual quantity ordered and the
product sale price per item. Before you analyze the sales data, you will need
to calculate each line subtotal:
a. Click Data in the toolbar on the left to show your data tables.
b. Click the Sales_Order_Lines table in the list on the right.
c. In the Home tab, click New Column.
d. For the column formula, enter: Line_Subtotal = Sales_Order_Lines
[Product_Sale_Price] * Sales_Order_Lines[Sales_Order_­Quantity_Sold].
e. Click Report in the toolbar on the left to return to your report.
6. Create a visualization showing the total sales revenue by customer:
a. In the Visualizations pane, click Treemap.
b. Drag Sales_Order_Lines.Line_Subtotal to the Values box.
c. Drag Customer_Master_Listing.Customer_State to the Category box.
d. Drag Customer_Master_Listing.Business_Name to the Details box.
e. Resize the chart so it fills the left half of the page.
f. Click Format your visual (paintbrush) button and click General > Title.
g. Name the chart Sales Revenue by Customer.
h. Take a screenshot (label it 4-2MA).
7. Create a visualization showing the total sales revenue by month:
a. Click anywhere on the page outside of your first visualization; then in
the Build visual pane, click Line Chart.
b. Drag Sales_Order.Sales_Order_Date to the X-axis box. Click the X next
to Quarter and Day to remove them from the date hierarchy.
c. Drag Sales_Order_Lines.Line_Subtotal to the Y-axis box.
d. In the toolbar on your chart, click Expand all down one level in the
­hierarchy button (the button looks like an upside-down fork) to reveal
the sales by month.

220

ISTUDY ric44907_ch04_180-243.indd 220 02/14/23 12:44 PM


Rev. Confirming Pages

e. Click the Analytics (magnifying glass) button, and click Forecast > On.
1. Click Options set the Forecast length to 3, and click Apply.
f. Click the Format visual (paintbrush) button, and click General > Title.
g. Name the chart Sales Revenue Forecast.
h. Drag Finished_Goods_Products.Product_Description to the Filters on
this page box.
i. Uncheck Select all and check Imperial Stout.
j. Take a screenshot (label it 4-2MB).
8. After you answer the lab questions, you may close Power BI Desktop. Save
your worksheet as Lab 4-2 Slainte Sales Explore.pbix.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop.


2. Click Connect > To a File > Microsoft Excel.
3. Locate the Lab 4-2 Slainte Dataset.xlsx file on your computer and click Open.
4. In the Data Source window, double-click the following tables to add them
to your model. If prompted, verify the correct primary key/foreign key pairs
and close the Edit Relationship window:
a. Sales_Order, Customer_Master_Listing, Sales_Order_Lines, Finished_
Goods_Products
5. Each sales order line item shows the individual quantity ordered and the
product sale price per item. Before you analyze the sales data, you will need
to calculate each line subtotal:
a. Click Analysis > Create Calculated Field. . .
b. For the formula name (top box), enter: Line Subtotal
c. For the formula (bottom box), enter: [Product_Sale_Price] *
[Sales_­Order_Quantity_Sold].
d. Click OK to return to your sheet.
6. Create a visualization showing the total sales revenue by customer:
a. Click Sheet 1.
b. Drag Sales_Order_Lines.Line Subtotal to the Size button on the Marks
pane.
c. Drag Customer_Master_Listing.Customer State to the Color button on
the Marks pane.
d. Drag Customer_Master_Listing.Business Name to the Label button on
the Marks pane.
e. Right-click the Sheet 1 tab and rename it Sales Revenue by Customer.
f. Take a screenshot (label it 4-2TA).
7. Create a visualization showing the sales revenue forecast by product:
a. Click Worksheet > New Worksheet.
b. Drag Sales_Order.Sales Order Date to the Columns shelf. Click the

ISTUDY ric44907_ch04_180-243.indd 221 02/20/23 12:56 PM


down arrow next to YEAR(Sales Order Date) and choose Month with
month + year.
c. Drag Sales_Order_Lines.Line Subtotal to the Rows shelf.
d. In the fields list, right-click Finished_Goods_Products.Product
­Description and choose Show Filter.
e. Click the Analytics tab in the panel on the left and drag Forecast to the
line chart.
1. In the Marks pane, remove the Forecast color.
f. To show the forecast for individual products, drag Finished Goods
Products.Product Description to the Color button in the Marks
pane.
g. In the Product Description filter on the right, uncheck (All) and check
Imperial Stout.
h. Right-click the Sheet 2 tab and rename it Sales Revenue Forecast.
i. Take a screenshot (label it 4-2TB).
8. After you answer the lab questions, you may close Tableau Desktop. Save
your worksheet as Lab 4-2 Slainte Sales Explore.twb.

Lab 4-2 Objective Questions (LO 4-2)


OQ1. Who is the largest customer in Iowa (IA)?
OQ2. How many customers reside in Texas?
OQ3. Use the filters to adjust the sales revenue forecast for each product description.
Which product has the most uncertain forecast, that is, the widest range of
­possible values?
OQ4. What is the trend in the forecast for sales overall relative to March?
OQ5. What type of analytics (descriptive, diagnostic, predictive, prescriptive) is repre-
sented in the Sales Revenue by Customer visualization?
OQ6. What type of analytics (descriptive, diagnostic, predictive, prescriptive) is repre-
sented in the Sales Revenue Forecast visualization?

Lab 4-2 Analysis Questions (LO 4-2, 4-3)


AQ1. If you’re trying to find the top 10 customers, why would dividing the tree map
by state make these harder to find?
AQ2. What other visualizations might you use to explore sales revenue?
AQ3. What other ways might you slice sales revenue to help management explore or
predict performance?

Lab 4-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

222

ISTUDY
Lab 4-3 Create Dashboards—LendingClub
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: To better understand loans and the attributes of people borrowing money,
your manager has asked you to provide some examples of different characteristics related to
the loan amount. For example, are more loan dollars given to home owners or renters?
Data: Lab 4-3 Lending Club Transform.zip - 29MB Zip / 26MB Excel / 6MB Tableau

Lab 4-3 Example Output


By the end of this lab, you will create a dashboard exploring different loan statistics and
distributions. While your results will include different data values, your work should look
similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 4-3M Example Loan Stats Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 4-3T Example Loan Stats Dashboard in Tableau Desktop

ISTUDY
Lab 4-3 Part 1 Summarize Loan Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-3 [Your name] [Your email address].docx.
This dashboard will focus on exploring the various attributes that underlie the borrow-
ers. It will combine a series of facts and summarized statistics with several visualizations
that break down the loan information by different dimensions.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and load your data:


a. Click Home > Connect to Data > Excel.
b. Locate the Lab 4-3 Lending Club Transform.xlsx file on your computer
and click Open.
2. In the Navigator window, check the following table and click Load:
a. LoanStats3c.
3. Create a summary card showing the total loan amount for the dataset:
a. In the Visualizations pane, click Card.
b. Drag loan_amt to the Fields box.
4. Create a summary card showing the median loan amount for the dataset:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click Card.
b. Drag loan_amt to the Fields box.
c. Click the drop-down arrow next to loan_amt in the field box and choose
Median from the list.
d. Drag the median loan card to the right side of the total loan amount card.
5. Create a summary card showing the median interest rate for the dataset:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click Card.
b. Drag int_rate to the Fields box.
c. Click the drop-down arrow next to int_rate in the field box and choose
Median from the list.
d. Drag the median loan card to the right side of the median loan amount
card.
6. Create a summary card showing the median debt-to-income level for the dataset:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click Card.
b. Drag dti to the Fields box.
c. Click the drop-down arrow next to dti in the field box and choose
­Median from the list.
d. Drag the median loan card to the right side of the median interest
rate card.

224

ISTUDY
e. Take a screenshot (label it 4-3MA) of the four cards along the top of
the page.
7. After you answer the lab questions, save your worksheet as Lab 4-3 Lending
Club Dashboard.pbix, and continue to Lab 4-3 Part 2.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop and load your data:


a. Click Connect > To a File > More. . . .
b. Locate the Lab 4-3 Lending Club Transform.hyper file on your computer
and click Open.
2. Create a data sheet showing the total loan amount for the dataset:
a. Click Sheet 1 and rename it Total Loan Amount.
b. Drag Loan Amnt to the Text button on the Marks pane.
3. Create a data sheet showing the median loan amount:
a. Click Worksheet > New Worksheet.
b. Right-click the Sheet 2 tab and rename it Median Loan Amount.
c. Drag Loan Amnt to the Text button on the Marks pane.
d. Click the SUM(Loan Amnt) pill in the Marks pane and choose ­Measure >
Median.
4. Create a data sheet showing the median interest rate:
a. Click Worksheet > New Worksheet.
b. Right-click the Sheet 3 tab and rename it Median Interest Rate.
c. Drag Int Rate to the Text button on the Marks pane.
d. Click the SUM(Int Rate) pill in the Marks pane and choose Measure >
Median.
5. Create a data sheet showing the median debt-to-income ratio:
a. Click Worksheet > New Worksheet.
b. Right-click the Sheet 4 tab and rename it Median DTI.
c. Drag Dti to the Text button on the Marks pane.
d. Click the SUM(Dti) pill in the Marks pane and choose Measure >
­Median.
6. Create a dashboard containing your data sheets:
a. Click Dashboard > New Dashboard.
b. Right-click the Dashboard 1 tab and rename it Loan Dashboard.
c. In the Size section in the pane on the left, click Desktop Browser and
change Fixed Size to Automatic.
d. Drag a Horizontal object from the Objects pane on the left to your
­dashboard.

ISTUDY
e. Drag each of the four sheets to the dashboard so they appear in a line
along the top from left to right. Hint: Position them each on the far right
side of the previous one:
1. Total Loan Amount, Median Loan Amount, Median Interest Rate,
Median DTI
2. Resize the sheets so they fill the top row evenly. Hint: Click each
sheet and change the Standard drop-down in the toolbar to Fit
Width. You can also click the drop-down arrow on the dark toolbar
that appears on the edge of each sheet to do the same thing.
f. Take a screenshot (label it 4-3TA) of the four cards along the top of
the page.
7. After you answer the lab questions, save your worksheet as Lab 4-3 Lending
Club Dashboard.twb, and continue to Lab 4-3 Part 2.

Lab 4-3 Part 1 Objective Questions (LO 4-2)


OQ1. What is the total loan amount?
OQ2. What is the median loan amount?
OQ3. What is the median interest rate?
OQ4. What is the median debt-to-income ratio?
OQ5. What type of analytics (descriptive, diagnostic, predictive, or prescriptive) do
these data cards represent?

Lab 4-3 Part 1 Analysis Questions (LO 4-2, 4-3)


AQ1. Why do we show the median values in this dashboard instead of average values?
AQ2. What other summary values would you find useful in summarizing this loan
data on a dashboard?

Lab 4-3 Part 2 Dig Deeper into Loan Data


This second analysis will focus on the breakdown of data by different dimensions and will
be added to your dashboard created in Part 1.

Microsoft | Power BI Desktop

1. Open your Lab 4-3 Lending Club Dashboard.pbix file from Lab 4-3 Part 1 if
you closed it.
2. Create a visualization showing the loan amount by term:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click 100% Stacked Bar Chart. Drag the object to

226

ISTUDY
Rev. Confirming Pages

the bottom-right corner of the page. Resize it by dragging the bottom to


match the chart next to it, if needed.
b. Drag loan_amnt to the X-axis box.
c. Drag term to the Legend box.
d. Click the Format visual (paintbrush) button, and click General >
Title > Text.
e. Name the chart Loan Amount by Term.
3. Create a visualization showing the loan amount by ownership:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click Clustered Bar Chart. Drag the object to the
right of the loan amount by term. Resize it by dragging the bottom to
match the chart next to it, if needed.
b. Drag loan_amnt to the X-axis box.
c. Drag home_ownership to the Y-axis box.
d. Click the Format visual (paintbrush) button, and click General >
Title > Text.
e. Name the chart Loan Amount by Ownership.
4. Create a visualization showing the loan amount by month:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click Stacked Area Chart. Drag the object to the
right of the loan amount by ownership. Resize it by dragging the bottom
to match the chart next to it, if needed.
b. Drag loan_amnt to the Y-axis box.
c. Drag issue_d to the X-axis box.
d. On the visualization, click Expand all down one level in the hierarchy (forked
arrow) icon in the toolbar twice until you see the monthly timeline.
e. Click the Format visual (paintbrush) button, and click General > Title >
Text.
f. Name the chart Loan Amount by Month.
5. Create a visualization showing the loan amount by debt-to-income ratio:
a. Click anywhere on the page outside of your current visualization; then in
the Visualization pane, click Scatter chart. Drag the object to the right
of the loan amount by ownership. Resize it by dragging the bottom to
match the chart next to it, if needed.
b. Drag loan_amnt to the Y-axis box.
c. Drag dti to the X-axis box. Then click the down arrow in the dti element
and choose Don’t summarize.
d. Click the Format visual (paintbrush) button, and click General > Title >
Text.
e. Name the chart Loan Amount by DTI.
f. Take a screenshot (label it 4-3MB).
6. After you answer the lab questions, you may close Power BI. Save your work-
book as 4-3 Lending Club Dashboard.pbix.

ISTUDY ric44907_ch04_180-243.indd 227 02/20/23 01:00 PM


Tableau | Desktop
1. Open your Lab 4-3 Lending Club Dashboard.twb file from Lab 4-3 Part 1 if
you closed it.
2. Create a visualization showing the loan amount by term:
a. Click Worksheet > New Worksheet.
b. Drag Loan Amnt to the Columns shelf.
c. Click the down arrow next to the SUM(Loan Amnt) pill in the Columns
shelf and choose Quick Table Calculation > Percent of Total.
d. Drag Term to the Color button in the Marks pane.
e. Click the down arrow next to the SUM(Term) pill in the Marks pane
and choose Dimension from the list.
f. Right-click the Sheet 5 tab and rename it Loan Amount by Term.
3. Create a visualization showing the loan amount by ownership:
a. Click Worksheet > New Worksheet.
b. Drag Loan Amnt to the Columns shelf.
c. Drag Home Ownership to the Rows shelf.
d. Click the down arrow next to the Home Ownership pill in the Rows shelf
and click Sort > Descending.
e. Drag Term to the Color button in the Marks pane.
f. Click the down arrow next to the SUM(Term) pill in the Marks pane
and choose Dimension from the list.
g. Right-click the Sheet 6 tab and rename it Loan Amount by Ownership.
4. Create a visualization showing the loan amount by month:
a. Click Worksheet > New Worksheet.
b. Drag Loan Amnt to the Rows shelf.
c. Drag Issue D to the Columns shelf.
d. Click down arrow next to the YEAR(Issue D) pill in the Columns shelf
and choose Month (May 2015).
e. In the Marks pane, change Automatic to Area.
f. Right-click the Sheet 7 tab and rename it Loan Amount by Month.
5. Create a visualization showing the loan amount by debt-to-income ratio:
a. Click Worksheet > New Worksheet.
b. Drag Loan Amnt to the Rows shelf.
c. Drag Dti to the Columns shelf.
d. Click the down arrow next to the SUM(Dti) pill in the Columns shelf
and choose Dimension.
e. In the Marks pane, change Automatic to Circle.
f. Right-click the Sheet 8 tab and rename it Loan Amount by DTI.
6. Finally, add these visuals to your Loan Dashboard:
a. Click Loan Dashboard.
b. Drag a Horizontal object from the Objects pane to the bottom of your
dashboard.

228

ISTUDY
c. Drag Loan Amount by Term to the bottom of the page and change it
from Standard to Fit Width in the toolbar.
d. Drag Loan Amount by Ownership to the right of the Loan Amount by
Term pane.
e. Drag Loan Amount by Month to the right of the Loan Amount by
­Ownership pane.
f. Drag Loan Amount by DTI to the right of the Loan Amount by Month pane.
g. Adjust your dashboard to fit everything.
h. Take a screenshot (label it 4-3TB).
7. After you answer the lab questions, you may close Tableau Desktop. Save
your workbook as 4-3 Lending Club Dashboard.twb.

Lab 4-3 Part 2 Objective Questions (LO 4-2)


OQ1. What percentage of loans have a 36-month term?
OQ2. Do most of the borrowers own homes, have a mortgage, or rent?
OQ3. Which period has the most nonvolatile volume of loans, January to June or July
to December?
OQ4. Do borrowers with a high debt-to-equity ratio tend to have larger loans or
smaller loans?

Lab 4-3 Part 2 Analysis Questions (LO 4-2, 4-3)


AQ1. What might explain the pattern of loans in the second half of the year?
AQ2. What are some additional ways you could slice loans that would be useful to
lenders as they evaluate collectability of loans?
AQ3. What other patterns or values stand out in this dashboard?

Lab 4-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 4-4  omprehensive Case: Visualize Declarative


C
Data—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: With a firm understanding of the different data models available to you,
now is your chance to show Dillard’s management how you can share the story with a visual
dashboard. You have been tasked with creating a dashboard to show declarative data so
management can see how different stores are currently performing.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

ISTUDY
Rev. Confirming Pages

Lab 4-4 Example Output


By the end of this lab, you will create a dashboard showing sales revenue. While your results
will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 4-4M Example Declarative Data Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 4-4T Example Declarative Data Dashboard in Tableau Desktop

230

ISTUDY ric44907_ch04_180-243.indd 230 06/22/22 10:56 AM


Lab 4-4 Part 1 Compare In-Person Transactions across
States (Sum and Average)
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-4 [Your name] [Your email address].docx.
In Chapter 3 you identified that there is a significant difference between the average in-
store transaction amount and the average online transaction amount at Dillard’s. You will
take that analysis a step further in this lab to answer the following questions:
• How does the sum of transaction amounts compare across states and between In-Person
and Online sales?
• How does the average of transaction amounts compare across states and between
­In-Person and Online sales?
You will visualize these comparisons using bar charts and maps.
The state attribute for In-Person and Online sales will come from different tables. For
­In-Person sales, we will focus on the state that the store is located in (regardless of where the
customer is from). For our Online sales, we will focus on the state that the customer is order-
ing from—all online sales are processed in one location, Store 698 in Maumelle, Arkansas.
Recall that the Dillard’s database is very large, so we will work with a subset of the data
from just 5 days in Springdale. This will give you the insight into how In-Person and Online
sales differ across states without slowing down the process considerably by attempting to
import all 107,572,906 records.

Microsoft | Power BI Desktop

1. Create a new project in Power BI Desktop.


2. In the Home ribbon, click Get Data > SQL Server database.
3. Enter the following and click OK (keep in mind that SQL Server is not just
one database, it is a collection of databases, so it is critical to indicate the
server path and the specific database):
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
c. Data Connectivity mode: Import
d. Expand Advanced Options and input the following query, then click OK:
SELECT TRANSACT.TRAN_AMT, STORE.STATE AS STORE_STATE,
TRANSACT.STORE, CUSTOMER.STATE AS CUSTOMER_STATE
FROM (TRANSACT INNER JOIN STORE ON TRANSACT.STORE = STORE.
STORE) INNER JOIN CUSTOMER ON CUSTOMER.CUST_ID = TRANSACT.
CUST_ID WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
Note: This query will pull in four attributes from three different tables in
the Dillard’s database for each transaction: the Transaction Amount, the
Store Number, the State the Store is located in, and the State the cus-
tomer is from. Keep in mind that “State” for customers may be outside
of the contiguous United States!
e. If prompted to enter credentials, you can keep the default to “Use my
current credentials” and click Connect.

231

ISTUDY
Rev. Confirming Pages

4. If prompted with an Encryption Support warning, click OK to move past it.


5. Click Load.
6. Create a bar chart showing total sales revenue by state for in-person
transactions:
a. In the Visualizations pane, click Stacked Bar Chart.
b. Drag STORE_STATE to the Y-axis box.
c. Drag TRAN_AMT to the X-axis box.
d. In the Visualizations pane, click the Format visual (paintbrush) button
and set the ­following:
1. Visual > Y-axis > Title > Store State
2. Visual > X-axis > Title > Sales Revenue
3. General > Title > Text > Total Revenue by State
7. Create a Filled Map showing total sales revenue by state for in-person
transactions:
a. Copy the bar chart. (Select Copy and then Paste on the ribbon). Drag
the copy to the right of the bar chart.
b. Edit your new visual by clicking the Filled Map in the Visualization pane
to change the second bar chart into a symbol map.
c. Click the Format visual (paintbrush) button, and click Visual > Fill
Colors > Colors.
d. Click the fx ­Conditional Formatting.
e. Under Based on Field, choose TRAN_AMT and click OK.
f. In the Visualizations pane, click the Format visual (paintbrush) button
and set the ­following:
1. General > Title > Text > Total Revenue by State Map
8. So far, these visuals show the sum of transaction amounts across each state
for every type of transaction—online and in-person. To make these visuals
more meaningful, you will create a filter to show only the in-person transac-
tions by excluding the store location that processes online orders.
a. Right-click the STORE attribute and select Add to filters > Page-level
filters.
b. From the Filters pane:
1. Show items when the value: is not
2. 698
c. Click Apply Filter. Note: The only value that should be impacted is the
sum of transactions in Arkansas.
9. Adjust the view by selecting View > Page View > Actual Size.
10. Take a screenshot (label it 4-4MA).
11. To create similar visualizations that show the Average Transaction amounts
across states instead of the Sum, duplicate the two existing visuals by copy-
ing and pasting them.
a. Drag the duplicate visuals beneath the existing visuals.
b. Click each new visualization individually and select the drop-down arrow
on the TRAN_AMT item in the Value box, then select Average.

232

ISTUDY ric44907_ch04_180-243.indd 232 02/20/23 01:03 PM


Rev. Confirming Pages

c. For the new map visual, adjust the Fill Colors to be based on an average
of TRAN_AMT instead of a Sum:
1. Click the map visual and click the Format visual (paintbrush) button.
2. Go to Visual > Fill Colors > Color.
3. Click the Fx button.
4. Under Summarization, select Average and click OK.
5. General > Title > Text > Average Revenue by State
12. Take a screenshot (label it 4-4MB).
13. After you answer the lab questions, save your work as Lab 4-4 Dillard’s Sales
.pbix and continue to Lab 4-4 Part 2.

Tableau | Desktop

1. Open Tableau Desktop.


2. Go to Connect > To a Server > Microsoft SQL Server.
3. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
4. Instead of adding full tables, click New Custom SQL and input the following
query, then click OK.
SELECT
TRANSACT.TRAN_AMT, STORE.STATE AS [Store State], TRANSACT.STORE,
CUSTOMER.STATE AS [Customer State]
FROM (TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE)
INNER JOIN CUSTOMER
ON CUSTOMER.CUST_ID = TRANSACT.CUST_ID
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’
Note: This query will pull in four attributes from three different tables in
the Dillard’s database for each transaction: the Transaction Amount, the
Store Number, the State the Store is located in, and the State the customer
is from. Keep in mind that “State” for customers may be outside of the
contiguous United States!
5. Before we create visualizations, we need to master the data by changing the
default data types. Select the # or Abc icon above the attributes to make the
following changes:
a. Store State: Geographic Role > State/Province
b. STORE:String
c. Customer State: Geographic Role > State/Province
6. Click Sheet 1.
7. Even though you edited the data types, Tableau is still interpreting STORE as a
measure. Drag the STORE (Count) measure to the Dimensions area above it.

233

ISTUDY ric44907_ch04_180-243.indd 233 02/20/23 01:06 PM


8. Create a bar chart showing how the sum of transactions differs across states.
a. Columns: TRAN_AMT
b. Rows: Store State
c. In the toolbar, click the Sort Descending button.
d. Right-click the Sheet 1 tab and rename it Total Revenue by State.
9. Currently this shows data for all transactions including the Online transac-
tions processed through store 698. Your next step is to filter out that location.
a. Right-click STORE and select Show filter.
b. Scroll down in the new filter (you may need to hide the Show Me menu
to see the full filter) and remove the check mark next to 698.
10. Create a filled map in a new sheet showing how the sum of transactions dif-
fers across states.
a. Right-click the Total Revenue by State tab and select Duplicate.
b. Expand the Show Me menu and select Filled Map.
1. Right-click the Total Revenue by State (2) tab and rename it Total
Revenue by State Map.
11. Duplicate each sheet so that you can create similar charts using Average as
the aggregate instead of Sum.
a. Duplicate the Total Revenue by State sheet and rename the new sheet
Bar Chart: Average of In-Person Sales.
b. Right-click the pill SUM(TRAN_AMT) from the Rows shelf and change
the Measure to Average.
c. Duplicate the Total Revenue by State Map and rename the new sheet
Filled Map: Average of In-Person Sales.
d. Right-click the pill SUM(TRAN_AMT) from the Marks shelf and change
the Measure to Average.
12. It can be easier to view each visual that you created at once if you add them
to a dashboard.
a. Click Dashboard > New Dashboard.
b. In the Size option, change Fixed Size to Automatic.
c. Drag each of your four sheets into each corner of the dashboard.
13. Take a screenshot (label it 4-4TA).
14. After you answer the lab questions, save your work as Lab 4-4 Dillard’s Sales.twb
and continue to Lab 4-4 Part 2.

Lab 4-4 Part 1 Objective Questions (LO 4-2)


OQ1. Which state has the lowest sum of transaction amount?
OQ2. Which state has the highest sum of transaction amount?
OQ3. What is the average transaction amount for Montana (MT)?
OQ4. What is the average transaction amount for in-person transactions in
Arkansas (AR)?
OQ5. What type of analytics (descriptive, diagnostic, predictive, prescriptive) is
represented in these visualizations?

234

ISTUDY
Rev. Confirming Pages

Lab 4-4 Part 1 Analysis Questions (LO 4-2, 4-3, 4-5)


AQ1. When working with geographic data, it is possible to view the output by a map
or by a traditional visualization, such as the bar charts that you created in this
lab. What type of insights can you draw more quickly with the map visualization
than the bar chart? What type of insights can you draw more quickly from the
bar chart?
AQ2. Texas has the highest sum of transaction amounts, but only the fourth-highest
average. What insight can you draw from that difference?

Lab 4-4 Part 2 Compare Online Transactions across


States (Sum and Average)
In Part 2 of this lab you will create four similar visualizations to what you created in Part 1,
but with two key differences:
1. You will reverse the filter to show only transactions processed in store 698. This will
allow you to see only Online transactions.
2. Instead of slicing the data by STORE.STATE, you will use CUSTOMER.STATE. This
is because all online sales are processed in Arkansas, so the differences would not be
meaningful for online sales. Additionally, viewing where the majority of online sales
originate can be helpful for decision making.

Microsoft | Power BI Desktop

1. Open your Lab 4-4 Dillards Sales.pbix file from Lab 4-4 Part 1 if you
closed it.
2. Create a second page to place your Customer State visuals on.
3. Create the following four visuals (review the detailed steps in Part 1 if
necessary):
a. Bar Chart: Sum of Online Transactions
1. Y-axis: Customer State
2. X-axis: TRAN_AMT
3. Filter: STORE is 698
b. Filled Map: Sum of Online Transactions
c. Bar Chart: Average of Online Transactions
d. Filled Map: Average of Online Transactions
4. Take a screenshot (label it 4-4MC).
5. Put it all together! To answer the questions and derive insights, you may
want to create a new dashboard with all of the visualizations on the same
page.
6. After you answer the lab questions, you may close Power BI. Save your work-
book as Lab 4-4 Dillard’s Sales.pbix.

235

ISTUDY ric44907_ch04_180-243.indd 235 02/11/23 10:54 AM


Tableau | Desktop

1. Open your Lab 4-4 Dillard’s Sales.twb file from Lab 4-4 Part 1 if you closed it.
2. Create the following four visuals, including the proper naming of each sheet
(review the detailed steps in Part 1 if necessary):
a. Bar Chart: Sum of Online Transactions
1. Rows: TRAN_AMT
2. Columns: Customer State
3. Filter: De-select (All) and select only 698
b. Filled Map: Sum of Online Transactions
c. Bar Chart: Average of Online Transactions
d. Filled Map: Average of Online Transactions
3. Add each visualization to a dashboard.
4. Take a screenshot of your dashboard (label it 4-4TB).
5. After you answer the lab questions, you may close Tableau. Save your work-
book as 4-4 Dillard’s Sales.twb.

Lab 4-4 Part 2 Objective Questions (LO 4-2)


OQ1. What is the average online transaction amount for customers from Maine?
OQ2. What is the sum of online transactions in Alaska (AK)?
OQ3. Is the average online transaction amount for customers from Texas greater than
or less than the average transaction amount for stores located in Texas?

Lab 4-4 Part 2 Analysis Questions (LO 4-4, 4-5)


AQ1. What insights should Dillard’s draw from the online transactions that occur in
states that already have Dillard’s stores in them? What additional exploratory
analysis would you perform on these transactions?
AQ2. You would like to compare the average online transaction amount for customers
who reside in states that do not have Dillard’s stores with the online transaction
amount for customers who reside in states that do have Dillard’s stores. How
would you prepare your data to make that comparison?

Lab 4-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 4-5 C
 omprehensive Case: Visualize Exploratory
Data—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.

236

ISTUDY
Case Summary: In Lab 4-4 you discovered that Texas has the highest sum of transactions
for both in-person and online sales in a subset of the data, and you would like to explore
these sales more to determine if the performance is the same across all of the stores in
Texas and across the different departments.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 4-5 Example Output


By the end of this lab, you will create a dashboard to visualize exploratory data. While your
results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 4-5M Example Exploratory Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 4-5T Example Exploratory Dashboard in Tableau Desktop

Lab 4-5 Part 1 Identify the Texas Store with the Highest
Revenue
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 4-5 [Your name] [Your email address].docx.

237

ISTUDY
Rev. Confirming Pages

In this first part of the lab, we will look at revenue across Texas cities and then drill down
to the specific store location that has the highest revenue (sum of transaction amount).
While in the previous lab we needed to limit our sample to just 5 days because of how large
the TRANSACT table is, in this lab we will analyze all of the transactions that took place
in stores in Texas.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and connect to your data.


a. In the Home ribbon, click Get Data > SQL Server database.
b. Enter the following and click OK (keep in mind that SQL Server is not
just one database, it is a collection of databases, so it is critical to indi-
cate the server path and the specific database):
1. Server: essql1.walton.uark.edu
2. Database: WCOB_DILLARDS
3. Data Connectivity: DirectQuery
c. If prompted to enter credentials, you can keep the default to “Use my
current credentials” and click Connect. If prompted with an Encryption
Support warning, click OK to move past it.
d. In the Navigator window, check the following tables and click Load:
1. DEPARTMENT,SKU,STORE,TRANSACT
e. Drag STORE.STATE to the Filters on all pages box and place a check
mark next to TX.
2. Create a Stacked Bar Chart showing total sales revenue by store in Texas:
a. In order to drill-down by store location, we need to create a hierarchy of
State, City, and Store:
1. In the Fields pane, right-click the STORE.STATE field or click More
Options and choose New Hierarchy.
2. Right-click the new STATE Hierarchy or click More Options and
choose Rename. Give your hierarchy the name LOCATION.
3. In the Fields pane, drag STORE.CITY to your new ­LOCATION
­hierarchy, then drag STORE.STORE there, too. Your
­LOCATION hierarchy should now have three fields: STATE,
CITY, and STORE.
b. In the Visualizations pane, click Stacked bar chart.
c. Drag STORE.LOCATION to the Y-axis box.
d. Drag TRANSACT.TRAN_AMT to the X-axis box. You will now see the
total sales amount for Texas.
e. In the toolbar in the top-right or bottom-right of your chart, click
­Expand all down one level in the hierarchy button (the button looks like
an upside-down fork) to view the totals by city.
f. Click Expand all down one level in the hierarchy button one more time to
see the total sales by store.

238

ISTUDY ric44907_ch04_180-243.indd 238 02/20/23 01:07 PM


Rev. Confirming Pages

g. Click the Format button in the Visualizations pane to add titles:


1. Visual > Y-axis > Title > Store
2. Visual > X-axis > Title > Sales Revenue
3. General > Title > Text > Sales Revenue by Store
h. Take a screenshot (label it 4-5MA).
3. After you answer the lab questions, you may close Power BI. Save your work
as Lab 4-5 Dillard’s Exploratory Analysis and continue to Lab 4-5 Part 2.

Tableau | Desktop

1. Open Tableau Desktop and connect to your data:


a. Go to Connect > To a Server > Microsoft SQL Server.
b. Enter the following and click Sign In:
1. Server: essql1.walton.uark.edu
2. Database: WCOB_DILLARDS
c. Add the TRANSACT, SKU,STORE, and DEPARTMENT tables to the
Data Source page. Ensure that the tables join correctly (you can check
the appropriate relationships in Appendix J).
2. Click Sheet 1.
3. Drag STORE.State to the Filters shelf and select TX. Click OK.
4. Right-click the new State filter and choose Show Filter.
5. Right-click the new State filter and choose Apply to Worksheets > All Using
This Data Source.
6. Create a bar chart showing how the sum of transactions differs across states.
a. Drag TRANSACT.Tran_Amt to the Columns shelf.
b. Drag STORE.City to the Rows shelf.
c. In the toolbar, click the Sort Descending button.
d. This shows us that Houston has the highest revenue across all of the cit-
ies that have Dillard’s stores, but we need to drill down further to explore
if there are multiple Dillard’s store locations in Houston.
e. Drag STORE.STORE to the Rows shelf after City and click Sort Descending.
f. Right-click the Sheet 1 tab and rename it Total Sales by Store.
7. Take a screenshot (label it 4-5TA).
8. After you answer the lab questions, you may close Tableau. Save your work as
Lab 4-5 Dillard’s Exploratory Analysis and continue to Lab 4-5 Part 2.

Lab 4-5 Part 1 Objective Questions (LO 4-2, 4-3)


OQ1. Which Texas city is the store that has the highest sum of transactions located in?
OQ2. Which Texas city has the most stores?
OQ3. What type of analysis did you perform in Part 1 of this lab (descriptive, diagnos-
tic, predictive, or prescriptive)?

239

ISTUDY ric44907_ch04_180-243.indd 239 02/14/23 12:51 PM


Rev. Confirming Pages

Lab 4-5 Part 2 Explore the Department Hierarchy and


Compare Revenue across Departments
Now that you have identified the Texas store that earns the highest revenue, you can explore
which types of items are being sold the most. You will do so by filtering to include only the
store with the highest revenue, then creating a Department hierarchy, and finally drilling
down through the different departments with the highest revenue.
Dillard’s stores product detail in a hierarchy of department decade > department ­century >
Department > SKU. We will not add SKU to the hierarchy, but you can extend your explor-
atory analysis by digging into SKU details on your own after you complete the lab.

Microsoft | Power BI Desktop

1. Open your Lab 4-5 Dillard’s Exploratory Analysis file from Part 1 if you
closed it.
2. Copy your Bar Chart and Paste it to the right of your first visualization.
3. Right-click the bar associated with Dallas 716 and select Include to filter the
data to only that store.
4. The filter for the Dallas 716 store is active now, so you can remove City and
Store from the Axis—this will clear up space for you to be able to view the
department details.
5. In the fields list, drag DEPARTMENT.DEPTDEC_DESC on top of
DEPARTMENT.DEPTCENT_DESC to build a hierarchy.
6. Rename the hierarchy DEPT_HIERARCHY, then add a third-level field by
dragging and dropping the DEPARMENT.DEPT_DESC field on top of your
new hierarchy.
7. Explore the Department Hierarchy:
a. Drag DEPT_HIERARCHY to the Axis box and remove the LOCATION
fields. This provides a summary of how each Department Century has
performed.
b. If you expand the entire hierarchy (upside-down fork button), you will
see a sorted list of all departments, but they will not be grouped by cen-
tury. To see the department revenue by century, you can drill down from
each century directly:
1. In the visualization toolbar, click the Drill Up arrow to return to the
top level.
2. Right-click the bar for Ready-to-Wear and select Drill Down to see the
decades within the Ready-to-Wear century.
3. Right-click the bar for Career and select Drill Down to see the depart-
ments within the career decade.
c. Click the Format visual (paintbrush) button in the Visualizations pane to
add titles:
1. Visual > Y-axis > Title > Department
2. Visual > X-axis > Title > Sales Revenue
3. General > Title > Text > Sales Revenue by Department
8. Take a screenshot (label it 4-5MB).

240

ISTUDY ric44907_ch04_180-243.indd 240 02/14/23 12:52 PM


9. Continue exploring the data by drilling up and down the hierarchy. When
you want to return back to the overview of department or century, click the
Drill Up button (up arrow). You can continue to drill down into specific
centuries and decades to learn more about the data.
10. After you answer the lab questions, you may close Power BI. Save your work
as Lab 4-5 Dillard’s Exploratory Analysis.

Tableau | Desktop

1. Open your Lab 4-5 Dillard’s Exploratory Analysis file from Part 1 if you
closed it.
2. Right-click your Total Sales by Store tab and click Duplicate.
3. Rename the new sheet Total Sales by Department.
4. Right-click the bar associated with Dallas 716 and select Keep Only to filter
the data to only that store.
5. The filter for the Dallas 716 store is active now, so you can remove City and
Store from the Rows—this will clear up space for you to be able to view the
department details.
6. In the fields list, drag DEPARTMENT.Deptdec Desc on top of D ­ EPARTMENT
.Deptcent Desc to build a hierarchy. Rename the hierarchy to Department
Hierarchy. Add the third level of the hierarchy, Dept Desc, by dragging the
DEPARTMENT.Dept Desc field beneath Deptdec Desc in the Department
Hierarchy so that it shows Deptcent Desc, Deptdec Desc, and Dept Desc in
that order.
7. Explore the Department Hierarchy:
a. Drag Department Hierarchy to the Rows shelf. This provides a summary
of how each Department Century has performed.
b. Click the plus sign on Deptcent Desc to drill down and show Deptdec Desc.
and sort in descending order.
8. Take a screenshot (label it 4-5TB).
9. Continue exploring the data by drilling up and down the hierarchy. When you
want to collapse your hierarchy back to Decade or Century, click the minus
sign on the pills, then click the plus sign to expand back down into the details.
10. Create a new dashboard and drag the Total Sales by Store and Total Sales by
Department to your dashboard.
11. After you answer the lab questions, you may close Tableau. Save your work as
Lab 4-5 Dillard’s Exploratory Analysis.

Lab 4-5 Part 2 Objective Questions (LO 4-2)


OQ1. Which Department Decade is the best selling in the Shoes Century?
OQ2. Which Department Century is the worst performing?
OQ3. Which Department is the best selling in the Big Ticket (Century) > Case Goods
(Decade)?

241

ISTUDY
Lab 4-5 Part 2 Analysis Questions (LO 4-3, 4-4)
AQ1. This lab is a starting point for exploring revenue across stores and departments
at Dillard’s. What next steps would you take to further explore these data?
AQ2. Some of the department centuries and decades are not readily easy to understand
if we’re not Dillard’s employees. Which attributes would you like to learn more
about?
AQ3. In this lab we used relatively simple bar charts to perform the analysis. What
other visualizations would be interesting to use to explore these data?

Lab 4-5 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

242

ISTUDY
ISTUDY
Chapter 5
The Modern Accounting
Environment

A Look at This Chapter


Most of the focus of Data Analytics in accounting is focused on auditing, managerial accounting, financial statement
analysis, and tax. This is partly due to the demand for high-quality data and the need for enhancing trust in the assur-
ance process, informing management for decisions, and aiding investors as they select their portfolios. In this chapter,
we look at how both auditors and managers are using technology in general to improve the decisions being made.
We also introduce how Data Analytics helps facilitate continuous auditing and reporting.

A Look Back
Chapter 4 completed our discussion of the IMPACT model by explaining how to communicate your results through
data visualization and through written reports. We discussed how to choose the best chart for your dataset and your
purpose. We also helped you learn how to refine your chart so that it communicates as efficiently and effectively as
possible. The chapter wrapped up by describing how to provide a written report tailored to specific audiences who
will be interested in the results of your data analysis project.

A Look Ahead
In Chapter 6, you will learn how to apply Data Analytics to the audit function and how to perform substantive audit
tests, including when and how to select samples and how to confirm account balances. Specifically, we discuss the
use of different types of descriptive, diagnostic, predictive, and prescriptive analytics as they are used to generate
computer-assisted auditing techniques.

244

ISTUDY
The large public accounting firms offer a variety of analytical
tools to their customers. Take PwC’s Halo, for example. This tool
allows auditors to interrogate a client’s data and identify patterns
and relationships within the data in a user-friendly dashboard. By
mapping the data, auditors and managers can identify inefficien-
cies in business processes, discover areas of risk exposure, and
correct data quality issues by drilling down into the individual
users, dates and times, and amounts of the entries. Tools like
Halo allow auditors to develop their audit plan by narrowing their
focus and audit scope to unusual and infrequent issues that rep-
resent high audit risk.
fizkes/Shutterstock
Source: http://halo.pwc.com

OBJECTIVES
After reading this chapter, you should be able to:

LO 5-1 Understand how automation has created a data-rich environment where


technology helps accountants, auditors, and managers improve the
decisions being made.
LO 5-2 Understand different approaches to organizing enterprise data and
common data models.
LO 5-3 Describe the appropriate tasks and approaches to automating
procedures.
LO 5-4 Evaluate continuous monitoring techniques and alarms.
LO 5-5 Understand cloud-based collaboration platforms.

245

ISTUDY
246 Chapter 5 The Modern Accounting Environment

LO 5-1 THE MODERN DATA ENVIRONMENT


Understand how As businesses have embraced automation over the past several decades, more information
automation has about financial transactions is captured in large databases. In addition to details of transac-
created a data- tion data, these sensors capture metadata (e.g., time stamps, user details, and contents of
rich environment unstructured data) that provide insight into the workings of the company. Sensors provide
where technology details on movement through a building to identify optimal location of resources, track the
helps accountants, health of employees to help control health insurance costs, and allow detailed analysis of
auditors, and the volume of everyday events to help managers manage robotic scripts and tasks. Even
managers improve
traditional data-entry tasks are now performed by specialized software that recognizes text
the decisions
being made.
from documents and maps it into database fields. Once you have an understanding of the
IMPACT cycle (as shown in Exhibit 5-1, understanding the needs and structure of the busi-
ness will help you perform meaningful analyses.

EXHIBIT 5-1
The IMPACT Cycle
Source: Isson, J. P., and J. S.
Harriott. Win with Advanced
Business Analytics: Creating
Business Value from Your
Data. Hoboken, NJ: Wiley,
2013.

Automation can include routine tasks, such as combining data from different sources for
analysis, and more complex actions, such as responding to natural language queries. In the
past, analytics and automation were performed by hobbyists and consultants within a firm.
In a modern environment, companies form centers of expertise where they concentrate
specialists in a single geographic location and use information and communication tech-
nologies to work in remote teams. Because the data are network-accessible, multiple users
interact with the data and complete workflows of tasks with the assistance of remote team
members and bots, or automated robotic scripts commonly called robotics process automa-
tion. The specialists manage the bots like they would normal employees, continuously evalu-
ating their performance and contribution to the company.
You’ll recall from your auditing course that assurance services are crucial to building
and maintaining trust within the capital markets. In response to increasing regulation in the
United States, the European Union, and other jurisdictions, both internal and external audi-
tors have been tasked with providing enhanced assurance while also attempting to reduce
(or at least maintain) the audit fees. This has spurred demand for more audit automation
along with an increased reliance on auditors to use their judgment and decision-making
skills to effectively interpret and support their audit findings with managers, shareholders,
and other stakeholders.
Both external and internal auditors have been applying simple Data Analytics for
decades in evaluating risk within companies. Think about how an evaluation of inventory

ISTUDY
Chapter 5 The Modern Accounting Environment 247

turnover can spur a discussion on inventory obsolescence or how working capital ratios
are used to identify significant issues with a firm’s liquidity. From an internal audit (and/
or management accounting) perspective, evaluating cost variances can help identify opera-
tional inefficiencies or unfavorable contracts with suppliers.
The audit concepts of professional skepticism and reasonable assurance are as much a
part of the modern audit as in the past. There has been a shift, however, from simply pro-
viding reasonable assurance on the processes to the additional assurance of the robots
that are performing a lot of the menial audit work. Where, before, an auditor may have
looked at samples and gathered evidence to make inferences to the population, now that
same auditor must understand the controls and parameters that have been programmed
into the robot to analyze the full population. In other words, as these automated bots do
more of the routine analytics, auditors will be free to exercise more judgment to interpret
the alarms and data while refocusing their effort on testing the parameters used by the
robots.
Auditors use Data Analytics to improve audit quality by more accurately assessing risk
and selecting better substantive procedures and tests of controls. While the exercises the
auditors conduct are fairly routine, the models can be complex and require auditor judg-
ment and interpretation. For example, if an auditor receives 1,000 notifications of a control
violation during the day, does that mean there is a control weakness or that the settings on
the automated control are too precise? Are all those notifications actual control violations
that require immediate attention, or are most of them false positives—transactions that are
flagged as exceptions but are normal and acceptable?
The auditors’ role is to make sure that the appropriate analytics are used and that the
output of those analytics—whether a dashboard, notifications of exceptions, or accuracy of
predictive models—correspond to management’s expectations and assertions.

The Increasing Importance of the Internal Audit


If you look at the assurance market, there are many trends that are affecting the profession.
First, the major applications of Data Analytics in auditing are not solely focused on the
financial statements as evaluated by public accounting firms. Rather, these tend to focus
on data quality, internal controls, and the complex information systems that support the
business process—areas typically reserved for the internal audit department at a firm. Sec-
ond, the risk and advisory practices of the public accounting firms are experiencing greater
growth, in large part due to firms’ outsourcing or co-sourcing of the internal audit function.
Third, external auditors are permitted to rely on the work of internal auditors to provide
support for their opinion of financial statements.
For these reasons, most of the innovations in Data Analytics have originated in inter-
nal audit departments, where there is constant pressure to enhance business value while
minimizing costs. Many companies’ experience with Data Analytics in the internal audit
department have come from internal auditors who have investigated Data Analytics on
their own. These individuals then find a champion with management and are encouraged
to continue their work. Under the guidance of the chief audit executive (CAE) or another
manager, these individuals build teams to develop and implement analytical techniques to
aid the following:
1. Process efficiency and effectiveness.
2. Governance, risk, and compliance, including internal controls effectiveness.
3. Information technology and information systems audits.
4. Forensic audits in the case of fraud.
5. Support for the financial statement audit.

ISTUDY
248 Chapter 5 The Modern Accounting Environment

Internal auditors are also more likely to have working knowledge of the different types of
systems implemented at their companies. They are familiar with how the general journals
from a product like Oracle’s JD Edwards actually reconcile to the general ledger in SAP
to generate financial reports and drill down into the data. Because the systems themselves
and the implementation of those systems vary across organizations (and even within orga-
nizations), internal auditors recognize that analytics are not simply a one-size-fits-all type
of strategy.

Lab Connection
Lab 5-5 has you filter data to limit the scope of data analysis to meet internal
audit objectives.

PROGRESS CHECK
1. What types of sensors do businesses use to track activity?
2. Make the case for why an internal audit is increasingly important in the mod-
ern audit. Why is it also important for external auditors and the scope of their
work?

LO 5-2 ENTERPRISE DATA


Understand While organizations have become more data-centric as they have adopted enterprise sys-
different tems (ES) over the past few decades, these systems can vary greatly among organizations.
approaches Some companies will take a homogeneous systems approach for their data structure by
to organizing ensuring that all of their divisions and subsidiaries use a uniform installation of a common
enterprise data ES, such as SAP. Homogeneous systems enable management to consolidate the informa-
and common data tion from various locations and roll them up in management reports, audit support, and
models. financial statements with minimal additional effort. Other companies that grow through
acquisition take a heterogeneous systems approach, where they attempt to integrate the exist-
ing systems (such as SAP, Oracle, JD Edwards, and others) of the companies they acquire
and use a series of translators to convert the output of those systems into usable financial
information. Systems translator software attempts to map the various tables and fields from
these varied enterprise systems into a data warehouse, where all of the data can be analyzed
centrally, as shown in Exhibit 5-2. The data warehouse is updated periodically, typically on
a daily basis, to reflect recent firm activity.
One of the primary obstacles that managers and auditors face is access to appropriate
data. As noted in Chapter 2, managers and auditors may request flat files or extracts from
an IT manager. Frequently, these files may be incomplete, unrelated, limited in scope, or
delayed when they are not considered a priority by IT managers. Increasingly, managers
and auditors request read-only access to the data warehouse so they can evaluate transac-
tion data, such as purchases and sales, and the related master data, such as employees and
vendors, in a timely manner. By avoiding a data broker, they get more relevant data for their
analyses and analyze multiple relationships and explore other patterns in a more meaning-
ful way. In either case, the managers and auditors work with duplicated data, rather than
querying the production or live systems directly.

ISTUDY
Chapter 5 The Modern Accounting Environment 249

Homogeneous Enterprise System Heterogeneous Enterprise System EXHIBIT 5-2


Homogeneous ­Systems,
Heterogeneous
SAP SAP SAP SAP Oracle JDE
­Systems, and Software
Translators

Translator
Data
Warehouse

Data
Warehouse
Management Audit
Dashboard Program

Continuous Management Audit


Monitoring Dashboard Program

Continuous
Monitoring

Common Data Models


As automation of Data Analytics procedures becomes more common, working within dif-
ferent data environments can present challenges, especially for auditors. To minimize some
of the effort required to interact with data, analysts adopt a common data model, which is a
tool used to map existing database tables and fields from various systems to a standardized
set of tables and fields for use with analytics. When the underlying systems change, the
model is updated to pull data from the new tables and fields. Similar to the translation soft-
ware mentioned previously, a common data model makes it easier to perform and automate
routine analytical procedures even when the underlying systems change.
The AICPA’s audit data standards (ADS) provide one example of a common data model with
tables and fields that are needed by auditors to perform routine audit tasks. The AICPA recom-
mends that ES vendors standardize the output of data that auditors are likely to use. The goal of
the standards is to reduce the data loading and transformation effort required by the auditor, so
they can focus on the analytics more quickly as well as define real-time or continuous analytics
via the data warehouse. These standards are voluntary, and actual implementation is currently
limited, but they provide a good basis for data needed to audit specific company functions.
The current set of audit data standards defines the following standards:1
• Base: defines the formats for files and fields as well as master data requirements for
users, business units, segments, and tax tables.
• General Ledger: defines the chart of accounts, source listings, trial balance, and general
ledger or journal entry detail.
• Order to Cash Subledger: defines sales orders and line items, shipments and line items,
invoices and line items, open accounts receivable and adjustments, cash receipts, and
customer master data, shown in Exhibit 5-3.
1
 ttps://www.aicpa.org/InterestAreas/FRC/AssuranceAdvisoryServices/DownloadableDocuments/Audit-
h
DataStandards/AuditDataStandards.O2C.July2015.pdf.

ISTUDY
250 Chapter 5 The Modern Accounting Environment

• Procure to Pay Subledger: defines purchase orders and line items, goods received and
line items, invoices received and line items, open accounts payable and adjustments,
payments, and supplier master data.
• Inventory Subledger: defines inventory location master data, product master data, inven-
tory on hand data, and inventory movement transactions as well as physical inventory
and material cost.
• Fixed Asset Subledger: defines fixed asset master data, additions, removal, and deprecia-
tion calculations.

EXHIBIT 5-3
Audit Data Standards
The audit data
standards define
common elements
needed to audit the
order-to-cash or sales
process.
Source: https://www.aicpa.
org/InterestAreas/FRC/
AssuranceAdvisoryServices/
DownloadableDocuments/
AuditDataStandards/
AuditDataStandards.O2C.
July2015.pdf

*If receivable balances are tracked by customer only (not by invoice), then Customer_Account_ID is used as a key to join tables
to the Open_Accounts_Receivable table instead of both Customer_Account_ID and Invoice_ID.
**The User_Listing table can be joined to three fields, all of which contain a user ID—Entered_By, Approved_By, Last_Modified_By.

With standard data elements in place, not only will internal auditors streamline their
access to data, but they also will be able to build analytical tools that they can share with
others within their company or professional organizations. This can foster greater collabo-
ration among auditors and increased use of Data Analytics across organizations. These data
elements will be useful when performing substantive testing in Chapter 6.
Even if the standard is never adopted by data suppliers, auditors can still take advantage
of the audit data standards as a common data model. For example, Exhibit 5-4 shows the
mapping of a set of Purchase Card data to the Procure to Pay Subledger Standard. Once
the mapping algorithm has been generated using SQL or other tool, any new data can be
analyzed quickly and easily.

Lab Connection
Lab 5-1 has you transform transaction data to conform with the AICPA’s audit
data standard. Lab 5-2 has you create a dashboard with visualizations to
explore the transformed common data.

ISTUDY
Chapter 5 The Modern Accounting Environment 251

Procure to Pay EXHIBIT 5-4


Purchase Card Transactions Mapping (SQL) Subledger Standard Mapping Purchase
Card Data to the
Source Year/Month ”” Purchase_Order_ID ­Procure to Pay
Source Type Purchase_Order_Date ­Subledger Audit
Data Standard
Cardholder Last Name “2020” Purchase_Order_Fiscal_Year
Cardholder First Name Business_Unit_Code

Item Description Supplier_Account_ID


Amount Entered_By

Business Unit Entered_Date


Merchant Name Purchase_Order_Amount_Local
Transaction Date “USD” Purchase_Order_Local_Currency

Posted Date

MCC Description

PROGRESS CHECK
3. What are the advantages of the use of homogeneous systems? Would a merger
target be more attractive if it used a similar financial reporting system as the
potential parent company?
4. How does the use of audit data standards facilitate data transfer between audi-
tors and companies? How does it save time for both parties?

AUTOMATING DATA ANALYTICS LO 5-3


Most of the effort in Data Analytics is preparing the analysis for the first time. This involves Describe the
identifying the data, mapping the tables and fields through ETL, and developing the visu- appropriate tasks
alization if needed. Once those tasks are completed, automation of the procedure involves and approaches
identifying the timing or schedule of how often the procedure should run, any parameters to automating
that might change, and what should happen if a new observation appears as an outlier. procedures.
The steps you follow to perform the analysis are part of the algorithm, and they can be
recorded using a scripting language, such as Python or R, or using off-the-shelf monitoring
software. That process is outside the scope of this textbook, but there are many resources
online to help you with this next step.
The main impact of automation and Data Analytics on the accounting profession comes
through optimization of the management dashboard and audit plan. When beginning an
engagement—whether to audit the financial statements, certify the enterprise system, or make
a recommendation to improve a business process—auditors generally follow a standardized
audit plan. The benefit of a standardized audit plan is that newer members of the audit team
can jump into an audit and contribute. Audit plans also identify the priorities of the audit.
An audit plan consists of one or more of the following elements:
• A methodology that directs that audit work.
• The scope of the audit, defining the time period, level of materiality, accounts and
­subsidiaries being audited, and expected completion time for the audit.
• Potential risk within the area being audited.

ISTUDY
252 Chapter 5 The Modern Accounting Environment

• Procedures and specific tasks that the audit team will execute to collect and analyze
evidence. These typically include tests of controls and substantive tests of transaction
details.
• Formal evaluation by the auditor and supervisors.
Because audit plans are formalized and standardized, they lend themselves to the use of
Data Analytics and, consequently, automation. For example:
• The methodology may be framed by specific standards, such as the Public Company
Accounting Oversight Board’s (PCAOB) auditing standards, the Committee of
­Sponsoring Organizations’ (COSO) Enterprise Risk Management framework, the
Institute of Internal Auditors’ (IIA) International ­Standards for the Professional
Practice of Internal Auditing (Standards), or the ­Information Systems Audit and
Control Association’s (ISACA) Control Objectives for Information and Related
Technologies (COBIT) framework. Data Analytics may be used to analyze the
standards and determine which requirements apply to the organization being audited.
• The scope of the audit defines parameters that will be used to filter the records or
­transactions being evaluated.
• Simple-to-complex Data Analytics can be applied to a company’s data (for an internal
audit) or a client’s data (for an external audit) during the planning stage of the audit to
identify which areas the auditor should focus on. This may include outlier detection or
other substantive tests of suspicious or risky transactions.
• Audit procedures themselves typically identify data, locations, and attributes that the
auditors will evaluate. These are the variables that will provide the input for many of the
substantive analytical procedures discussed in Chapter 6.
• The evaluation of audit data may be distilled into a risk score. This may be a function
of the volume of exceptional records or level of exposure for the functional area. If
the judgment and decision making can be easily defined, a rule-based analytic could
automatically assign a score for the auditor to review. For more complex judgments,
the increasing prevalence of artificial intelligence and machine learning discussed in
Chapter 3 may be of assistance. Historical observations of the scores auditors assign to
specific cases and outcomes may assist in the creation of an automated scoring model.
Typically, internal audit organizations that have adopted Data Analytics to enhance their
audit have done so when an individual on the team has begun tinkering with Data Analytics.
They convince their managers that there is value in using the data to direct the audit, and
the manager may become a champion in the process. Once they show the value proposition
of Data Analytics, they are given more resources to build the program and adapt the exist-
ing audit program to include more data-centric evaluation where appropriate.
Because of the potential disruption to the organization, it is more likely that an internal
auditor will adapt an existing audit plan than develop a new system from scratch. Automat-
ing the audit plan and incorporating Data Analytics involve the following steps, which are
similar to the IMPACT model:
1. Identify the questions or requirements in the existing audit plan.
2. Master the data by identifying attributes and elements that are automatable.
3. Perform the test plan, in this case by developing analytics (in the form of rules or mod-
els) for those attributes identified in step 2.
4. Address and refine results. List expected exceptions to these analytics and expected
remedial action by the auditor, if any.
5. Communicate insight by testing the rules and comparing the output of the analytics to
manual audit procedures.
6. Track outcomes by following up on alarms and refining the models as needed.

ISTUDY
Chapter 5 The Modern Accounting Environment 253

Let’s assume that an internal auditor has been tasked with implementing Data Analytics
to automate the evaluation of a segregation of duties control within SAP. The auditor evalu-
ates the audit plan and identifies a procedure for testing this control. The audit plan identifies
which tables and fields contain relevant data, such as an authorization matrix, and the specific
roles or permissions that would be incompatible. The auditor would use that information to
build a model that would search for users with incompatible roles and notify the auditors.

Lab Connection
Lab 5-4 has you review an audit plan and identify potential data tables and
attributes that would enable audit automation.

CONTINUOUS MONITORING TECHNIQUES LO 5-4


Data Analytics and automation allow management and internal auditors to continuously mon- Evaluate
itor and audit the systems and processes within their companies. Whereas a traditional audit continuous
may have the internal auditors perform a routine audit plan once every 12 to 36 months or so, monitoring
the continuous audit evaluates data in a form that matches the pulse of the business. For exam- techniques and
ple, purchase orders can be monitored for unauthorized activity in real time, while month-end alarms.
adjusting entries would be evaluated once a month. When exceptions occur—for example, a
purchase order is created with a customer whose address matches an employee’s—the auditors
are alerted immediately and given the option to respond right away to resolve the issue.
Continuous auditing is a process used by internal auditors that provides real-time assur-
ance over business processes and systems. It involves the application of rules or analytics
that perform a continuous monitoring function that constantly evaluates internal controls
and transactions and is the chief responsibility of management. It also generates continuous
reporting on the status of the system so that an internal auditor can know at any given time
whether the system is operating within the parameters set by management or not. In the
future, continuous reporting may also enable firms to publish real-time financial accounting
data for public analysis, but in practical use this may cause more problems for firms if they
are unable to validate or provide assurance on the data being reported.
Implementing continuous auditing procedures is similar to automating an audit plan
with the additional step of scheduling the automated procedures to match the timing and
frequency of the data being evaluated and notifying the auditor when exceptions occur.

Data Analytics at Work

Citi’s $900 Million Internal Control Mistake–Would Continuous


Monitoring Help?
In summer 2020, Citigroup made a huge mistake, when instead of transfer-
ring a $7.8 million interest payment from Revlon to its creditors, it accidentally
transferred $894 million of the bank’s own funds—the full amount of princi-
pal on the loan that wasn’t due for another few years. While there are inter-
esting legal arguments as to whether the funds should be returned, a court
decision in February 2021 said that the creditors should not be required to
return the funds. And does that mean that Revlon has fulfilled its debt in full?

ISTUDY
254 Chapter 5 The Modern Accounting Environment

John Lund/Blend Images LLC

What a herculean mistake! Although three humans were involved in the


approval chain as part of an internal control, no one caught the huge error.
Would continuous monitoring have caught the error? What internal control
would you devise that wouldn’t allow this to happen? Could humans and
­diagnostic/predictive analytics have prevented the error?
Source: Sorkin, Karaian, de la Merced, Hirsch, and Livni, “Ouch, That Hurts,” New York Times,
February 17, 2021.

Alarms and Exceptions


Whenever an automated or continuous auditing rule is violated, an exception occurs. The
record is flagged and systems generate an exception report that typically identifies the record
and the date of the exception.
Alarms are essentially a classification problem. A data value is sent through a simple
decision tree based on a series of rules and classified as a positive event (alarm) or a nega-
tive event (no alarm). Remember we talked about accuracy of models in Chapter 3: These
alarms will not always be correct.
Once the notification of the alarm or exception arrives, auditors follow a set of proce-
dures to resolve the issue. First, they must determine whether the alarm represents a true
positive, a transaction that is problematic, such as an error or fraud, or a false positive, where
a normal transaction is classified as problematic. Similarly, a true negative is an outcome
where the model correctly predicts no alarm, and a false negative is where the model incor-
rectly predicts the no alarm.
When too many alarms are false positive, auditors face information overload, where there
are too many incorrect alarms that distract them from adequately evaluating the system.
Because auditors are mostly concerned with true positives, they should attempt to train
or refine the models to minimize the potential flood of alarms that occurs when too many
alarms are false positives. This is summarized in Exhibit 5-5.

ISTUDY
Chapter 5 The Modern Accounting Environment 255

Normal Event Abnormal Event EXHIBIT 5-5


Four Types of Alarms
Alarm False positive True positive That an Auditor Must
No Alarm True negative False negative Evaluate

WORKING PAPERS AND AUDIT WORKFLOW LO 5-5


As audit procedures become increasingly technical, documentation continues to be an Understand
essential way for internal and external auditors to increase their reliance on automated con- cloud-based
trols and procedures. The idea of a black-box audit is no longer sufficient; rather, auditors collaboration
must have a better understanding of the tools they use and the output of those tools. This is platforms.
where working papers come into play.
Working papers are essential to audit planning, performance, and evaluation. They
provide the documentation for the procedures the auditors follow, evidence they collect,
and communication with the audit client. As they relate to Data Analytics, working papers
should contain the following items:
• Work programs used to document the audit procedures to collect, manipulate, model,
and evaluate data.
• IT-related documentation, including flowchart and process maps that provide system
understanding.
• Database maps (such as UML diagrams) and data dictionaries that define the location
and types of data auditors will analyze.
• Documentation about existing automated controls, including parameters and variables
used for analysis.
• Evidence, including data extracts, transformed data, and model output, that provides
support for the functioning controls and management assertions.
Policies and procedures that help provide consistent quality work are essential to main-
taining a complete and consistent audit. The audit firm or chief audit executive is respon-
sible for providing guidance and standardization so that different auditors and audit teams
produce clear results. These standardizations include consistent use of symbols or tick
marks and a uniform mechanism for cross-referencing output to source documents or data.

Electronic Working Papers and Remote Audit Work


As audit teams embrace a variety of information and communication technologies to enable
collaboration from different locations, audit firms have done so, as well. Increasingly, inter-
nal and external audit teams consist of more specialized onsite auditors who interact with
a team of experts and data scientists remotely at locations around the world. Many of the
routine tasks are offloaded to the remote or seasonal workers, freeing up onsite auditors to
use more professional judgment and expertise during the engagement. This results in cost
savings for the firm through increased efficiency at the firm level. The glue that holds the
audit team together is the electronic workpaper platform as well as other collaboration tools,
such as Microsoft Teams or Slack. The electronic workpaper platforms, such as TeamMate
or Xero, automate the workflow of evidence collection, evaluation, and opinion generation
on the part of the audit teams. The large accounting firms have proprietary systems that
accomplish a similar purpose. For example, PwC uses three systems to automate its audit
process. Aura is used to direct the audit by identifying which evidence to collect and analyze,
Halo performs Data Analytics on the collected evidence, and Connect provides the work-
flow process that allows managers and partners to review and sign off on the work. Most of

ISTUDY
256 Chapter 5 The Modern Accounting Environment

these platforms are hosted in the cloud, so members of the audit team can participate in the
various functions from any location. Smaller audit shops can build ad hoc workpaper reposi-
tories using OneDrive with Office 365, though there are fewer controls over the documents.

Lab Connection
Lab 5-3 has you upload audit data to the cloud and review changes.

PROGRESS CHECK
5. Continuous audit uses alarms to identify exceptions that might indicate an audit
issue and require additional investigation. If there are too many alarms and
exceptions based on the parameters of the continuous audit system, will continu-
ous auditing actually help or hurt the overall audit effectiveness?
6. PwC uses three systems to automate its audit process. Aura is used to direct the
audit by identifying which evidence to collect and analyze, Halo performs Data
­Analytics on the collected evidence, and Connect provides the workflow pro-
cess that allows managers and partners to review and sign off on the work. How
does that line up with the steps of the IMPACT model we’ve discussed through-
out the text?

Summary
■ As accounting has evolved over the past few decades, automation has driven many of
the changes in turn enabling additional Data Analytics. Data Analytics has improved
management’s and auditors’ ability to understand their business, assess risk, inform their
opinions, and improve assurance over the processes and controls in their organizations.
(LO 5-1)
■ Enterprise data appear in many forms and the adoption of a common data model makes
it easier to analyze data from a variety of systems with ease. (LO 5-2)
■ The main impact of automation and Data Analytics on the accounting profession comes
through optimization of the management dashboard and audit plan. (LO 5-3)
■ Data Analytics and automation allow management and internal auditors to continuously
monitor and audit the systems and processes within their companies. (LO 5-4)
■ As audit procedures become increasingly technical, documentation continues to be an
essential way for internal and external auditors to increase their reliance on automated
controls and procedures. (LO 5-5)

Key Words
audit data standards (ADS) (249) The audit data standards define common tables and fields that are
needed by auditors to perform common audit tasks. The AICPA developed these standards.

ISTUDY
common data model (249) A tool used to map existing database tables and fields from various systems
to a standardized set of tables and fields for use with analytics.
continuous auditing (253) A process that provides real-time assurance over business processes and
systems.
continuous monitoring (253) A process that constantly evaluates internal controls and transactions and
is the chief responsibility of management.
continuous reporting (253) A process that provides real-time access to the system status and account-
ing information.
data warehouse (248) A data warehouse is a repository of data accumulated from internal and external
data sources, including financial data, to help management decision making.
flat file (248) A means of storing data in one place, such as in an Excel spreadsheet, as opposed to stor-
ing the data in multiple tables, such as in a relational database.
heterogeneous systems approach (248) Heterogeneous systems represent multiple installations or
instances of a system. It would be considered the opposite of a homogeneous system.
homogeneous systems approach (248) Homogeneous systems represent one single installation or
instance of a system. It would be considered the opposite of a heterogeneous system.
production or live systems (248) Production (or live) systems are those active systems that collect and
report and are directly affected by current transactions.
systems translator software (248) Systems translator software maps the various tables and fields from
varied ERP systems into a consistent format.

ANSWERS TO PROGRESS CHECKS


1. Sensors can include door sensors to track movement in a building, health sensors to track
employee health, and metadata, such as time stamps and user details, to track transaction
activity to name a few.
2. There are many reasons for this trend, with perhaps the most important being that exter-
nal auditors are permitted to rely on the work of internal auditors to provide support for
their opinion of financial statements.
3. A homogeneous system allows effortless transmission of accounting and auditing data
across company units and international borders. It also allows company executives
(including the chief executive officer, chief financial officer, and chief information offi-
cer), accounting staff, and the internal audit team to intimately know the system. In the
case of a merger, integration of the two systems will require less effort than if they were
heterogeneous.
4. The use of audit data standards allows an efficient data transfer of data in a standardized
format that auditors can use in their audit testing programs. It can also save the company
time and effort in providing its transaction data in a usable fashion to auditors.
5. If there are too many alarms and exceptions, particularly with false negatives and false
positives, continuous auditing becomes more of a burden than a blessing. Work must be
done to ensure more true positives and negatives to be valuable to the auditor.
6. PwC’s Aura system would help identify the questions and master the data, the first two
steps of the IMPACT model. PwC’s Halo system would help perform the test plan and
address and refine results, the middle two steps of the IMPACT model. Finally, PwC’s Con-
nect system would help communicate insights and track outcomes, the final two steps of
the IMPACT model.

257

ISTUDY
Multiple Choice Questions
®

1. (LO 5-1) Under the guidance of the chief audit executive (CAE) or another manager,
internal auditors build teams to develop and implement analytical techniques to aid all
of the following audits except:
a. process efficiency and effectiveness.
b. governance, risk, and compliance, including internal controls effectiveness.
c. tax compliance.
d. support for the financial statement audit.
2. (LO 5-2) Which audit data standards ledger defines product master data, location data,
inventory on hand data, and inventory movement?
a. Order to Cash Subledger
b. Procure to Pay Subledger
c. Inventory Subledger
d. Base Subledger
3. (LO 5-2) Which audit data standards ledger identifies data needed for purchase orders,
goods received, invoices, payments, and adjustments to accounts?
a. Order to Cash Subledger
b. Procure to Pay Subledger
c. Inventory Subledger
d. Base Subledger
4. (LO 5-2) A company has two divisions, one in the United States and the other in China.
One uses Oracle and the other uses SAP for its basic accounting system. What would
we call this?
a. Homogeneous systems
b. Heterogeneous systems
c. Dual data warehouse systems
d. Dual lingo accounting systems
5. (LO 5-3) Which of the following defines the time period, the level of materiality, and the
expected time for an audit?
a. Audit scope
b. Potential risk
c. Methodology
d. Procedures and specific tasks
6. (LO 5-3) All of the following may serve as standards for the audit methodology except:
a. PCAOB’s auditing standards.
b. COSO’s ERM framework.
c. ISACA’s COBIT framework.
d. FASB’s accounting standards.
7. (LO 5-4) When there is an alarm in a continuous audit, but it is associated with a normal
event, we would call that a:
a. false negative.
b. true negative.
c. true positive.
d. false positive.

258

ISTUDY
8. (LO 5-4) When there is no alarm in a continuous audit, but there is an abnormal event,
we would call that a:
a. false negative.
b. true negative.
c. true positive.
d. false positive.
9. (LO 5-4) If purchase orders are monitored for unauthorized activity in real time while
month-end adjusting entries are evaluated once a month, those transactions monitored
in real time would be an example of a:
a. traditional audit.
b. periodic test of internal controls.
c. continuous audit.
d. continuous monitoring.
10. (LO 5-2) Who is most likely to have a working knowledge of the various enterprise
­systems that are in use in the company?
a. Chief executive officer
b. External auditor
c. Internal auditor
d. IT staff

Discussion and Analysis


®

1. (LO 5-1) Why has most innovation in Data Analytics originated more in an internal audit
than an external audit? Or if not, why not?
2. (LO 5-2) Is it possible for a firm to have general journals from a product like JD Edwards
actually reconcile to the general ledger in SAP to generate financial reports or drill
down to see underlying transactions? Why or why not?
3. (LO 5-2) Is it possible for multinational firms to have many different financial reporting
systems and enterprise systems packages all in use at the same time?
4. (LO 5-2) How does the systems translator software work? How does it store the merged
data into a data warehouse?
5. (LO 5-2) Why is it better to extract data from a data warehouse than a production or live
system directly?
6. (LO 5-2) Would an auditor view heterogeneous systems as an audit risk? Why or
why not?
7. (LO 5-5) Why would audit firms prefer to use proprietary workpapers rather than just
storing working papers on the cloud?

Problems
®

1. (LO 5-2) Match the description of the data standard to each of the current audit data
standards:
• Base
• General Ledger
• Order-to-Cash Subledger
• Procure-to-Pay Subledger

259

ISTUDY
Data Standard Description Current Audit Data Standards

1. Goods received and invoices received

2. Accounts receivable and cash receipts

3. Supplier master data

4. Chart of accounts

5. File formats

6. Shipments to customers

2. (LO 5-3) Accounting has a great number of standards-setting bodies, who generally are
referred to using an acronym. Match the relevant standards or the responsibility of each
to the acronym of these standards-setting bodies.
• AICPA
• PCAOB
• FASB
• COBIT
• SEC
• COSO
• IIA

Standard or Responsibility Acronym of Standards-Setting Body

1. Standard-setting body for audit data standards

2. S
 tandard-setting body for external auditing
standards

3. R
 equires submission of 10-K and 10-Q’s by
publicly held companies

4. A
 rticulates key concepts to enhance internal
controls and deter fraud

5. P
 rovides guidance for companies that use
information technology

6. Sets standards for financial reporting

7. Sets standards for internal auditing

3. (LO 5-4) In each of the following situations, identify which situation exists with regard to
normal and abnormal events, and alarms or lack of alarms from a continuous monitoring
system with these indicators:
• True negative
• False negative
• False positive
• True positive

No Alarm Alarm

1. Normal event

2. Abnormal event

260

ISTUDY
4. (LO 5-1, 5-2, 5-4, 5-5) Match the definitions to these real-time terms:
• Continuous reporting
• Continuous auditing
• Continuous monitoring

Real-Time Examples Real-Time Terms

1. T
 otal sales recognized by the company today
disclosed to shareholders

2. Real-time system status

3. Real-time evaluation of internal controls

4. Real-time assurance over revenue recognition

5. R
 eal-time evaluation of the size and nature of
each payroll transaction recorded

6. R
 eal-time assurance over the recording of
expenses

5. (LO 5-2) Companies have primarily homogeneous systems or heterogeneous systems.


Identify the system that relates to each feature listed below. (Select “Both” from the
feature column if more than one feature is applicable.)

System: Homogeneous?
Feature Heterogeneous? Or Both?

1. Requires system translator software

2. A
 llows auditors to review data in a data
warehouse

3. Integrates multiple ERP systems in multiple


locations

4. H
 as the same ERP systems in multiple locations

5. Is usually the result of many acquisitions

6. Simplifies the reporting and auditing process

6. (LO 5-2) Analysis: What are the advantages of the use of homogeneous systems?
Would a merger target be more attractive if it used a similar financial reporting system
as the potential parent company?
7. (LO 5-2) Multiple Choice: Consider Exhibit 5-3. Looking at the audit data standards
order-to-cash process, which of the following describes the purpose of the AR Adjust-
ments table?
a. Shows the balance in accounts receivable
b. Shows manual changes to accounts receivables, such as credit and debit memos
c. Shows the sales transactions and cash collections that affect the accounts receiv-
able balance
d. Shows the customer who owes the company money
8. (LO 5-2) Analysis: Who developed the audit data standards? In your opinion, why is it
the right group to develop and maintain them rather than, say, the Big Four firms or a
small practitioner? What is purpose of the data standards and to whom are the stan-
dards applicable?

261

ISTUDY
9. (LO 5-1) Multiple Choice: Auditors can apply simple to complex Data Analytics to a
client’s data. At which stage would DA be applied to identify which areas the auditor
should focus on?
a. Continuous
b. Planning
c. Remote
d. Reporting
e. Performance
10. (LO 5-4) Multiple Choice: What actions should be taken by either internal auditors or
management if a company’s continuous audit system has too many alarms that are false
positive?
a. Abandon the system.
b. Change the parameters to focus on lower-risk items.
c. Change the parameters to focus on higher-risk items.
d. Ignore the alarms.
11. (LO 5-4) Multiple Choice: What actions should be taken by either internal auditors or
management if a company’s continuous audit system has too many missed abnormal
events (such as false negatives)?
a. Ignore the alarms.
b. Abandon the system.
c. Change the parameters to focus on lower-risk items.
d. Change the parameters to focus on higher-risk items.
12. (LO 5-3) Analysis: Implementing continuous auditing procedures is similar to automat-
ing an audit plan with the additional step of scheduling the automated procedures to
match the timing and frequency of the data being evaluated and the notification to the
auditor when exceptions occur. In your opinion, will the traditional audit be replaced by
continuous auditing? Support your answer.

262

ISTUDY
LABS ®

Lab 5-1 Create a Common Data Model—Oklahoma


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: The State of Oklahoma captures purchase card transaction information
for each of the state agencies to determine where resources are used. The comptroller has
asked you to prepare the purchase card transactions using a common data model based on
the audit data standards so they can be analyzed using a set of standard evaluation models
and visualizations. The Fiscal Year runs from July 1 to June 30.
Data: Lab 5-1 OK PCard FY2020.zip - 12MB Zip / 74MB CSV

Lab 5-1 Example Output


By the end of this lab, you will transform data to fit a common data model. While your
results will include different data values, your work should look similar to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 5-1M Example of Transforming Data in Microsoft Power Query

263

ISTUDY
Tableau | Prep

Tableau Software, Inc. All rights reserved.


LAB 5-1T Example of Transforming data in Tableau Prep

Lab 5-1 Transform Data to Match the Audit Data Standard


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 5-1 [Your name] [Your email address].docx.
For this lab, you will work to transform purchase card transactions to match the format
of the AICPA’s audit data standard for the Procure to Pay Subledger. In most cases, this
involves renaming existing attributes. In some cases, you will use formulas to combine
values or fill in columns with missing text data—for example, adding “USD” to a column
including the currency type that is missing from the source file. Once these data are trans-
formed, you can pull them into existing models without the need to recreate the models
from scratch.

Microsoft | Excel + Power Query

1. Open a new blank workbook in Excel and import your data:


a. In the Data ribbon, click Get Data > From File > From Text/CSV.
b. Locate the Lab 5-1 OK PCard FY2020.csv file on your computer and click
Import.
c. Click Transform Data or Edit.
2. Now create new columns to transform the current data to match the audit data
standard:
a. Rename the following fields by double-clicking the field name or right-clicking
and choosing Rename:
AGENCYNBR -> Business_Unit_Code
AGENCYNAME -> Business_Unit_Description
ITEM_DESCR -> Purchase_Order_Line_Product_Description

264

ISTUDY
AMOUNT -> Purchase_Order_Amount_Local
MERCHANT -> Supplier_Account_Name
TRANSACTION_DATE -> Purchase_Order_Date
POST_DATE -> Entered_Date
MCC_DESCRIPTION -> Supplier_Group
ROWID -> Purchase_Order_ID
b. Take a screenshot (label it 5-1MA) of your data with the renamed Field
Names.
3. Now create new columns to transform the current data to match the audit data
standard:
a. Create custom columns to rename existing columns:
1. From the ribbon, click Add Column, then click Custom Column.
2. Refer to the table below to enter the New Column Name (e.g., Purchase_
Order_Fiscal_Year) and Custom Column Formula (e.g., = “2020”). If an
existing attribute is given, you can double-click the value from the Avail-
able Columns list to add it to the formula. If the value is given in quotes,
include the quotes to fill the column with that value.
3. Click OK to add the new custom column.
4. Repeat steps 1–3 for the remaining columns.

New Column Name (ADS Destination) Custom Column Formula (From PCard Source) Type

Purchase_Order_Fiscal_Year “2020” Text

Entered_By [FIRST_INITIAL] &” “&[LAST_NAME] Text

Purchase_Order_Local_Currency “USD” Text

4. Finally, remove the original columns so that only your new ADS Columns remain:
a. Right-click the following fields, and choose Remove:
1. CALENDAR_YEAR
2. CALENDAR_MONTH
3. LAST_NAME
4. FIRST_NAME
b. Take a screenshot (label it 5-1MB) of your new columns formatted to
match the ADS.
4. From the ribbon, click Home, then click Close & Load.
5. When you are finished answering the lab questions, you may close Excel. Save
your file as Lab 5-1 OK PCard ADS.xlsx.

Tableau | Prep
Lab Note: Tableau Prep takes extra time to process large datasets.
1. Open Tableau Prep Builder and connect to your data:
a. Click Connect to Data > To a File > Text File.
b. Locate the Lab 5-1 OK PCard FY2020.csv file on your computer and click
Open.

265

ISTUDY
2. Now remove, rename, and create new columns to transform the current data to
match the audit data standard:
a. Uncheck the following unneeded fields:
1. CALENDAR_YEAR
2. CALENDAR_MONTH
b. Rename the following fields by double-clicking the Field Name:
AGENCYNBR -> Business_Unit_Code
AGENCYNAME -> Business_Unit_Description
ITEM_DESCR -> Purchase_Order_Line_Product_Description
AMOUNT -> Purchase_Order_Amount_Local
MERCHANT -> Supplier_Account_Name
TRANSACTION_DATE -> Purchase_Order_Date
POST_DATE -> Entered_Date
MCC_DESCRIPTION -> Supplier_Group
ROWID -> Purchase_Order_ID
c. Take a screenshot (label it 5-1TA) of your flow with the renamed list of
Field Names.
3. Click the + next to Lab 5-1 OK PCa. . . in the flow and choose Add Clean Step.
a. From the toolbar, click Create Calculated Field. . . .
b. Refer to the table below to enter the Field Name (e.g., Purchase_Order_
Fiscal_Year) and Formula (e.g., “2020”). If the value is given in quotes,
include the quotes to fill the column with that value.
c. Click Apply to add the new custom column.
d. Repeat steps a–c for the remaining columns.

Field Name (ADS Destination) Formula (From PCard Source)


Purchase_Order_Fiscal_Year “2020”
Entered_By [FIRST_INITIAL]+” “+[LAST_NAME]
Purchase_Order_Local_Currency “USD”

4. Remove the LAST_NAME and FIRST_INITIAL fields. Right-click each and


choose Remove.
5. Verify that your cleaned dataset now has a field for each of the following:

ADS Field Name Type


Purchase_Order_ID Text
Purchase_Order_Date Date
Purchase_Order_Fiscal_Year Text
Business_Unit_Code Text
Business_Unit_Description Text
Supplier_Account_Name Text
Supplier_Group Text
Entered_By Text
Entered_Date Date
Purchase_Order_Amount_Local Number (Decimal)
Purchase_Order_Local_Currency Text
Purchase_Order_Line_Product_Description Text

266

ISTUDY
6. In the flow pane, right-click Clean 1 and choose Rename and name the step
Add Columns.
7. Take a screenshot (label it 5-1TB) of your cleaned data file, showing the
columns.
8. Click the + next to your Add Columns task and choose Output.
9. In the Output pane, click Browse:
a. Navigate to your preferred location to save the file.
b. Name your output file Lab 5-1 OK PCard ADS.hyper.
c. Click Accept.
10. In the Output box next to Add Columns, click the arrow to Run Flow. When it
is finished processing, click Done. It will show you the total number or records.
11. When you are finished answering the lab questions, you may close Tableau
Prep. Save your flow process file as Lab 5-1 OK PCard ADS.tfl.

Lab 5-1 Objective Questions (LO 5-2, 5-3)


OQ1. How many records or rows appear in your cleaned dataset, excluding the
header?
OQ2. How many attributes or columns appear in your cleaned dataset?

Lab 5-1 Analysis Questions (LO 5-2, 5-3)


AQ1. Why do you think it is important to combine cardholder names in your data
before you conduct your analysis?
AQ2. How does transforming the data to a common data model help make analysis
easier in the future?
AQ3. Look at the fields that you removed in the data transformation. What types of
analysis can you no longer perform with those missing data?

Lab 5-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 5-2  reate a Dashboard Based on a Common Data


C
Model—Oklahoma
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: The State of Oklahoma captures purchase card transaction information
for each of the state agencies to determine where resources are used. The comptroller has
asked you to prepare a dashboard summarizing the purchase card transactions using the
data you transformed using the common data model. The Fiscal Year runs from July 1 to
June 30.
Data: Lab 5-2 OK PCard ADS.zip - 37MB Zip / 30MB Excel / 17MB Tableau

267

ISTUDY
Lab 5-2 Example Output
By the end of this lab, you will transform data to fit a common data model. While your
results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 5-2M Example of Common Data Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 5-2T Example of Common Data Dashboard in Tableau Desktop

268

ISTUDY
Rev. Confirming Pages

Lab 5-2 Create a Dashboard


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 5-2 [Your name] [Your email address].docx.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and load your data:


a. Click Home > Get Data > Excel.
b. Locate the Lab 5-2 OK PCard ADS.xlsx file on your computer and click
Open.
c. Choose OK PCard FY2020 and click Transform.
2. Important! Adjust your data types so they will be correctly interpreted by
Power BI. Click the #, Calendar, or Abc icon above each field and choose
the following if they are not already set. Note: If prompted, choose Replace
Current.
a. Date: Purchase_Order_Date, Entered_Date
b. Fixed Decimal Number: Purchase_Order_Amount_Local
c. Text: everything else.
3. Take a screenshot (label it 5-2MA).
4. Click Close & Apply in the Home ribbon.
5. Starting on Page 1, create the following:
a. Add Purchase_Order_Date > Date Hierarchy > Month to the Filters on
this page pane and check July.
b. Add a stacked bar chart of Total Purchases by Cardholder sorted in
­descending order by purchase amount and resize it to fit the top-left
corner of the page:
1. X-axis: Purchase_Order_Amount_Local
2. Y-axis: Entered By
3. Format:
a. Visual > Y-axis > Title > Cardholder
b. Visual > X-axis > Title > Total Purchases
c. General > Title > Text > Total Purchases by Cardholder
c. Add a stacked bar chart of Total Purchases by Category showing the
purchase category as a color, sorted in descending order by purchase
amount and resize it to fit the top-right corner of the page:
1. X-axis: Purchase_Order_Amount_Local
2. Y-axis: Supplier Group
3. Format:
a. Visual > Y-axis > Title > Category
b. Visual > X-axis > Title > Total Purchases
c. General > Title > Text > Total Purchases by Category

269

ISTUDY ric44907_ch05_244-281.indd 269 02/11/23 11:57 AM


Rev. Confirming Pages

d. Add a stacked column chart with the Total Purchases by Day and resize
it to fit the bottom-left corner of the page:
1. Y-axis: Purchase_Order_Amount_Local
2. X-axis: Purchase_Order_Date > Date Hierarchy > Day
3. Legend: Supplier_Group
4. Format:
a. Visual > Y-axis > Title > Total Purchases
b. Visual > X-axis > Title > Day
c. General > Title > Text > Total Purchases by Day
e. Add a tree map of Total Purchases by Business Unit and resize it to fit
the bottom-right corner of the page:
1. Values: Purchase_Order_Amount_Local
2. Category: Business_Unit_Description
3. Format:
a. General > Title > Text > Total Purchases by Business Unit
6. Take a screenshot (label it 5-2MB).
7. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 5-2 OK PCard Analysis.pbix.

Tableau | Desktop

1. Open Tableau Desktop and load your data:


a. Click Connect to a File and choose More. . . .
b. Navigate to your Lab 5-2 OK PCard ADS.hyper and click Open.
2. On the Data Source tab, click Update Now to preview your data and verify
that they loaded correctly.
3. Important! Adjust your data types so they will be correctly interpreted by Tableau.
Click the #, calendar, or Abc icon above each field and choose the following:
a. Dates: Purchase Order Date, Entered Date
b. Number (decimal): Purchase Order Amount Local
c. String: everything else
4. Take a screenshot (label it 5-2TA).
5. Starting on Sheet1, create the following visualizations (each on a separate
sheet):
a. Show a distribution of Total Purchases by Cardholder showing the busi-
ness unit as a color, sorted in descending order by purchase amount:
1. Columns: SUM(Purchase Order Amount Local)
2. Rows: Entered By Note: If you get a file limit warning, choose Add all
members.

270

ISTUDY ric44907_ch05_244-281.indd 270 02/20/23 01:10 PM


3. Filter: Purchase Order Date > Months > July
a. Right-click and choose Show Filter.
b. Right-click and choose Apply to Worksheets > All Using This Data
Source.
b. Show a distribution of Total Purchases by Category sorted in descending
order by purchase amount:
1. Columns: SUM(Purchase Order Amount Local)
2. Rows: Supplier Group
c. Show a bar chart with the Total Purchases by Day:
1. Columns: DAY(Purchase Order Date)
2. Rows: SUM(Purchase Order Amount Local)
3. Marks > Color: Supplier Group Note: If you get a file limit warning,
choose Add all members.
d. Show a tree map of Total Purchases by Business Unit:
1. Marks > Size: SUM(Purchase Order Amount Local)
2. Marks > Color: Business Unit Description
3. Marks > Label: Business Unit Description
6. In your Tableau workbook, create a Dashboard tab and drag each of the four
visualizations into it from the pane on the left.
7. Change the size from Fixed Size to Automatic.
8. Take a screenshot (label it 5-2TB) of the All tables sheet.
9. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 5-2 OK PCard Analysis.twb.

Lab 5-2 Objective Questions (LO 5-2, 5-3)


OQ1. Which three categories received the most purchases?
OQ2. Which date of this dataset had the highest total purchase amount?
OQ3. How much was spent on the highest spending day?
OQ4. Which business unit spent the most?
OQ5. How much was spent by the biggest business unit?
OQ6. Which user, represented by an actual name, spent the most?

Lab 5-2 Analysis Questions (LO 5-2, 5-3)


AQ1. What do you think the ePay Cardholder account is used for?
AQ2. What would happen if you change the data source to another file that is format-
ted following the audit data standard for purchase orders?
AQ3. How else would you analyze purchase transactions?

Lab 5-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

271

ISTUDY
Lab 5-3 Set Up a Cloud Folder and Review Changes—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You have rotated into the internal audit department at Sláinte. Your
team is still using company email to send evidence back and forth, usually in the form of
documents and spreadsheets. There is a lot of duplication of these files, and no one is quite
sure which version is the latest. You see an opportunity to streamline this process using
OneDrive.
Data: Lab 5-3 Slainte Audit Files.zip - 294KB Zip / 360KB Files

Lab 5-3 Example Output


By the end of this lab, you will transform data to fit a common data model. While your
results will include different data values, your work should look similar to this:

Microsoft | OneDrive

Microsoft OneDrive
LAB 5-3M Example of Working Papers on Microsoft OneDrive

Lab 5-3 Part 1 Set Up a Cloud Folder


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 5-3 [Your name] [Your email address].docx.
Note: These instructions are specific to the free consumer version of Microsoft One-
Drive. The approach is similar for competing products, such as Box, Dropbox, Google
Drive, or other commercial products.
Auditors collect evidence in electronic workpapers that include a permanent file with
information about policies and procedures and a temporary file with evidence related to
the current audit. These files could be stored locally on a laptop, but the increased use of
remote communication makes collaboration through the cloud more necessary. There are a
number of commercial workpaper applications, but we can simulate some of those features
with consumer cloud platforms, like Microsoft OneDrive.

272

ISTUDY
Microsoft | OneDrive

1. Go to OneDrive.com and create a new folder:


a. Click Sign in in the top-right corner.
b. Sign in with your Microsoft account. (If your organization subscribes to
Office 365, use your school or work account here.)
c. On the main OneDrive screen, click New > Folder.
d. Name your folder Slainte Working Papers and click Create.
2. Share your folder with others:
a. Open your new folder and click Share from the bar at the top of the
screen.
b. Choose Anyone with a link can edit from the sharing options, then click
Copy Link. Paste the link in your lab doc.
c. Take a screenshot (label it 5-3MA) of your empty folder.
3. Upload some documents that will be useful for labs in this chapter and the
next.
a. Unzip the Lab 5-3 Slainte Audit Files folder on your computer. You
should see two folders: Master File and Current File.
4. Return to your OneDrive Slainte Working Papers folder, and upload the two
folders:
a. Click Upload > Folder in OneDrive and navigate to the folder where you
unzipped the lab files.
b. Or drag and drop the two folders from your desktop to the OneDrive
window in your browser.
c. You should see two new folders in your OneDrive. Because you added
them to a shared folder, the people you shared the folder with can now
see these as well.
d. Take a screenshot (label it 5-3MB).
5. Answer the lab questions, then continue to the next part.

Lab 5-3 Part 1 Objective Questions (LO 5-5)


OQ1. How many files are in the Master File?
OQ2. How many files are in the Current File?
OQ3. Who is the most likely user of these shared working papers?

Lab 5-3 Part 1 Analysis Questions (LO 5-5)


AQ1. What is the link to your shared folder?
AQ2. What advantage is there to sharing files in one location rather than emailing
copies back and forth?
AQ3. Explore the two folders you just uploaded. What kinds of documents and files
do you see?
AQ4. How do you think these files can be used for data analysis?

273

ISTUDY
Lab 5-3 Part 2 Review Changes to Working Papers
The goal of a shared folder is that other members of the audit team can contribute and edit
the documents. Commercial software provides an approval workflow and additional inter-
nal controls over the documents to reduce manipulation of audit evidence, for example.
For consumer cloud platforms, one control appears in the versioning of documents. As
revisions are made, old versions of the documents are kept so that they can be reverted to,
if needed.

Microsoft | OneDrive

1. Let’s start by making changes to files in your Slainte Working Papers.


a. Unzip the Lab 5-3 Slainte Files Revised folder on your computer. You
should see two files: Audit Plan and Employee_Listing.
b. Return to your Slainte Working Papers folder on OneDrive.
c. Upload the Audit Plan into your Master File and the Employee_Listing
into your Current File. You will be prompted to Replace or Keep Both
files. Click Replace for each. If you encounter an error, try reloading
OneDrive.
d. Take a screenshot (label it 5-3MC) of your Master File folder.
2. Now review the history of your documents:
a. Right-click on one of the newly uploaded files, and choose Version
­history from the menu that appears. The document will open with a
­version pane appearing on the left.
b. Click the older version of the file from the Versions pane. Newer versions
are at the top.
c. Take a screenshot (label it 5-3MD) of your version history.
d. Move between the old version of the file and the current version by
­clicking the time stamp in the panel on the left.
3. When you are finished answering the lab questions, you may close your
­OneDrive window. We will refer to these files again in Lab 5-4.

Lab 5-3 Part 2 Objective Questions (LO 5-5)


OQ1. What has changed between the two versions of the Audit Plan file?
OQ2. How many new employees were added to the Employee_Listing file?

Lab 5-3 Part 2 Analysis Questions (LO 5-5)


AQ1. How does version history act as a control for data files?
AQ2. Looking at the files in your shared folder, what other files would you expect to
be updated frequently?

Lab 5-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

274

ISTUDY
Lab 5-4 Identify Audit Data Requirements—Sláinte
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As the new member of the internal audit team at Sláinte, you have been
given access to the cloud folder for your team. The chief audit executive is interested in
using Data Analytics and automation to make the audit more efficient. Your internal audit
manager agrees and has tasked you with reviewing the audit plan. She has provided three
“audit action sheets” with procedures that they have been using for the past 3 years to evalu-
ate the procure-to-pay (purchasing) process and is interested in your thoughts for modern-
izing them.
Data: Refer to your shared OneDrive folder from Lab 5-3 or Lab 5-4 Slainte Audit Files.zip -
262KB Zip / 290KB Files

Lab 5-4 Example Output


By the end of this lab, you will summarize audit data requirements. One possible solution
below has been blurred intentionally, but your sheet layout should look similar to this:

Microsoft | Excel

Microsoft Excel
LAB 5-4M Example of Summarized Audit Automation Data Requirements in Microsoft Excel

Lab 5-4 Part 1 Look for Audit Procedures That Evaluate


Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 5-4 [Your name] [Your email address].docx.
You will begin your evaluation by reviewing the audit action sheets and highlighting key
data points and questions.

Microsoft | Excel

1. Open your Slainte Working Papers folder on OneDrive or the Lab 5-4 Slainte
Files Revised.zip file.

275

ISTUDY
2. Evaluate your audit action sheets:
a. Look inside the Master File for the document titled Audit Action Sheets
and open it to edit it.
b. Use the Yellow highlighter to identify any master or transaction tables,
such as “Vendors” or “Purchase Orders.”
c. Use the Green highlighter to identify any fields or attributes, such as
“Name” or “Date.”
d. Use the Blue highlighter to identify any specific values or rules, such as
“TRUE,” “January 1st,” “Greater than . . .”
3. Summarize your highlighted data:
a. Create a new Excel workbook in your Master File to summarize your
highlighted data elements from the three audit action sheets. Use the
following headers:

AAS# Table Attributes Values/Rules Step(s) Notes

b. Take a screenshot (label it 5-4MA) of your completed spreadsheet.


4. Now that you have analyzed the action sheets, look through the systems
documentation to see where those elements exist:
a. In the Master File, open the UML System Diagram and Data Dictionary
files.
b. Using the data elements you identified in your Audit Automation ­Summary
file, locate the actual names of tables and attributes and a­ cceptable data
values. Add them in three new columns in your summary:

Database Table Database Attribute Acceptable Values

5. Take a screenshot (label it 5-4MB) of your completed spreadsheet.


6. After you answer the lab questions, save your file as Audit Automation
­Summary.xlsx and continue to the next part.

Lab 5-4 Part 1 Objective Questions (LO 5-4, 5-5)


OQ1. How many attributes/fields are available in the dataset?
OQ2. What is the most common attribute that you identified to evaluate?
OQ3. Where are most of these attributes located?

Lab 5-4 Part 1 Analysis Questions (LO 5-4, 5-5)


AQ1. Read the first audit action sheet. What other data elements that are not listed in
the procedures do you think would be useful in analyzing this account?
AQ2. Which attributes were difficult to locate or in unexpected places in the dataset?
AQ3. What analyses could you do to evaluate these rules?

Lab 5-4 Part 2 Set the Frequency of Your Audit Procedures


With the data elements identified, you can formalize your internal audit plan. In the past, your
internal audit department performed each of the three action sheets once every 24 months.

276

ISTUDY
You have shared how increasing the frequency of some of the tests would provide a better
control for the process and allow the auditor to respond quickly to the exceptions. Your inter-
nal audit manager has asked you to propose a new schedule for the three audit action sheets.

Microsoft | Excel

1. Open the Audit Automation Summary.xlsx you created in Part 1.


2. Evaluate the potential to automate the specific rules:
a. Add two new columns:

Auto/Manual Frequency

b. For each element and rule you identified in Part 1, determine whether it
requires manual review or can be performed automatically and alert audi-
tors when exceptions occur. Add either Auto or Manual to that column.
c. Finally, determine how frequently the data should be evaluated. Indicate
Daily, Weekly, Monthly, Annually, or During Audit. Think about when
the data are being generated. For example, transactions occur every day,
but new employees are added every few months.
d. Take a screenshot (label it 5-4MC) of the All tables sheet.
3. When you are finished answering the lab questions, you may save and close
your file.

Lab 5-4 Part 2 Analysis Questions (LO 5-4, 5-5)


AQ1. Do you find that most of your identified procedures are manual or automatic?
AQ2. What is the most common frequency identified for the audit procedures?
AQ3. How does this evaluation help you identify opportunities for automation in the
analytics process?

Lab 5-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 5-5 Comprehensive Case: Setting Scope—Dillard’s


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As you evaluate Dillard’s data you may have noticed that the scale and
size of the data requires significant computing power and time to evaluate, especially as
you calculate more complex models. Management, auditors, tax preparers, and other stake-
holders will typically evaluate only a subset of the data. In this lab you will focus on setting
scope, or identifying the data that is relevant to specific decision-making processes.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

277

ISTUDY
Lab 5-5 Example Output
By the end of this lab, you will create a dashboard to visualize exploratory data. While your
results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 5-5M Example Filtering in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 5-5T Example Filtering in Tableau Desktop

Lab 5-5 Part 1 Explore the Data


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 5-5 [Your name] [Your email address].docx.
To help you narrow the request for specific store data, focus on the following
scenarios:

278

ISTUDY
Auditor:
As an auditor, you have been tasked with evaluating discounted transactions that occurred in
the first quarter of 2016 to see if they match the stated sales revenue on the financial statement.
Manager:
As a manager you would like to evaluate store performance in North Carolina during the
holiday shopping season of November and December 2015.
Financial Accountant:
As you are preparing your income statement, you need to calculate the total net sales for all
stores in 2015.
Tax Preparer:
Working in conjunction with state tax authorities, you need to calculate the sales tax liabil-
ity for your monthly tax remittance for March 2016.

Lab 5-5 Part 1 Objective Questions (LO 5-1, 5-2)


OQ1. As an auditor, which attribute would you most likely filter?
OQ2. As a manager, which attribute would you most likely filter?
OQ3. As a financial accountant, which attribute would you most likely filter?
OQ4. As a tax preparer, which attribute would you most likely filter?

Lab 5-5 Part 1 Analysis Questions (LO 5-1, 5-2)


AQ1. As an auditor, why might you be interested in expanding the date range when
evaluating sales data?
AQ2. As a manager, which measures would you most likely be interested in when
evaluating store performance?
AQ3. As a financial accountant, what data quality issues might you consider when
calculating net sales?
AQ4. As a tax preparer, what additional data would you need to calculate your tax
liability?

Lab 5-5 Part 2 Filter Data


Data reduction helps limit the scope of your analysis and decreases the time for your visual-
izations to be performed. Filter some of the Dillard’s data to narrow the scope of your data
collection.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and connect to your data:


a. In the Home ribbon, click Get Data > SQL Server database.
b. Enter the following and click OK:
1. Server: essql1.walton.uark.edu
2. Database: WCOB_DILLARDS
3. Data Connectivity: DirectQuery

279

ISTUDY
c. If prompted to enter credentials, you can keep the default to “Use my
­current credentials” and click Connect.
d. If prompted with an Encryption Support warning, click OK to move past it.
e. In the Navigator window, check the following tables and click Transform
Data:
1. CUSTOMER, DEPARTMENT, SKU, SKU_STORE, STORE,
­TRANSACT
2. In Power Query, filter your data:
a. Click the STORE query and filter the STATE to show only NC.
b. Click the TRANSACT query and filter the TRAN_DATE to show only a
range of dates from November 1, 2015, to December 31, 2015. Note: Power
Query shows only a sample of the filtered data.
3. Take a screenshot (label it 5-5MA) of the TRANSACT query view.
4. Click Close & Apply.
5. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 5-5 Dillard’s Scope.pbix.

Tableau | Desktop

1. Open Tableau Desktop and connect to your data:


a. Go to Connect > To a Server > Microsoft SQL Server.
b. Enter the following and click Sign In:
1. Server: essql1.walton.uark.edu
2. Database: WCOB_DILLARDS
c. Add the TRANSACT, SKU, STORE, and DEPARTMENT tables to the
Data Source page. Ensure that the tables join correctly (you can check
the appropriate relationships in Appendix J).
2. Go to Sheet 1 and set some filters:
a. Drag Store.State to the Filters pane and set the state to show only North
Carolina. Right-click the filter and choose Show Filter.
b. Drag Transact.Tran Date to the Filters pane and set the range of dates
from November 1, 2015, to December 31, 2015. Right-click the filter and
choose Show Filter.
3. Take a screenshot (label it 5-5TA) of the blank sheet with the filters vis-
ible. Note: You may need to close the Show Me panel to see the filters.
4. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 5-5 Dillard’s Scope.twb.

280

ISTUDY
Lab 5-5 Part 2 Analysis Questions (LO 5-1, 5-2)
AQ1. Why would you want to set filters before you begin creating your analyses?
AQ2. How does limiting the range of values improve the performance of your model?
AQ3. What other filters might you include to satisfy the requirements of the auditor,
financial accountant, or tax preparer from Part 1?

Lab 5-5 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

281

ISTUDY
Chapter 6
Audit Data Analytics

A Look at This Chapter


In this chapter, we focus on substantive testing within the audit setting. We identify when to use descriptive analytics
to understand the business environment, diagnostic analytics to compare expectation with reality, including Benford’s
analysis, and how predictive and prescriptive analytics are used to address future audit concerns.

A Look Back
In Chapter 5, we introduced Data Analytics in auditing by considering how both internal and external auditors are
using technology in general, and audit analytics specifically, to evaluate firm data and generate support for manage-
ment assertions. We emphasized audit planning, audit data standards, continuous auditing, and audit working papers.

A Look Ahead
Chapter 7 explains how to apply Data Analytics to measure performance for management accountants. By measuring
past performance and comparing it to targeted goals, we are able to assess how well a company is working toward a
goal and recommend actions to correct unexpected patterns.

282

ISTUDY
Internal auditors at Hewlett-Packard Co. (HP) understand how
Data Analytics can improve processes and controls. Manage-
ment identified abnormal behavior with manual journal entries,
and the internal audit department responded by working with
various governance and compliance teams to develop dash-
boards that would allow them to monitor accounting activity. The
dashboard made it easier for management and the auditors to
follow trends, identify spikes in activity, and drill down to iden-
tify the individuals posting entries. Leveraging accounting data
allows the internal audit function to focus on the risks facing HP
and act on data in real time by implementing better controls.
ra2studio/Shutterstock
Audit data analytics provides an enhanced level of control that is
missing from a traditional periodic audit.

OBJECTIVES
After reading this chapter, you should be able to:

LO 6-1 Understand different types of analysis for auditing and when to use
them.
LO 6-2 Explain basic descriptive analytic techniques used in auditing.
LO 6-3 Define and describe diagnostic analytics that are used in auditing.
LO 6-4 Characterize the predictive and prescriptive analytics used in auditing.

283

ISTUDY
284 Chapter 6 Audit Data Analytics

LO 6-1 WHEN TO USE AUDIT DATA ANALYTICS


Understand As discussed in Chapter 5, Data Analytics can be applied to the auditing function to
different types increase coverage of the audit, while reducing the time the auditor dedicates to the audit
of analysis for tasks. Think about the nature, extent, and timing of audit procedures.
auditing and when
to use them. • Nature represents why auditors perform audit procedures. In other words, nature helps
determine the objectives of the audit and the outputs generated by the business processes.
• Extent indicates how much auditors can test. The prevalence of data has expanded the
extent of audit testing.
• Timing tells us how often the procedure should be run. Automation allows auditors to
run analytics on a schedule and receive real-time alerts when exceptions occur.
All three of these elements help auditors identify when to apply Data Analytics to the
audit process. Auditors, whether internal or external to the company, should evaluate cur-
rent capabilities within their team and identify the goal of Data Analytics. Does it add
value? Does it enhance the process? Does it help the internal or external auditor be more
efficient and effective?
As an example of audit data analytics adding value, a recent research study finds that
when companies adopt audit data analytics in its internal audit function, there are lower
subsequent external audit fees and audit delays. The researchers further find that audit data
analytics technology focused on diagnostic analytics are specifically associated with sub-
sequent audit fee decreases as well as audit lag decreases with complementary evidence of
lower material weaknesses, subsequent restatements, and discretionary accruals following
audit data analytics adoption by the internal audit function.1
In reality, it is easy to overpromise on the expected benefits of Data Analytics and under-
deliver with the results. Without clear objectives and expected outcomes, audit departments
will fail with their use of Data Analytics. Here we refer once again to the IMPACT model
as applied to audit data analytics.

Identify the Questions


What is the audit department trying to achieve using Data Analytics? Do auditors need to
analyze the segregation of duties to test whether internal controls are operating effectively?
Are auditors looking for operational inefficiencies, such as duplicate payments of invoices?
Are auditors trying to identify phantom employees or vendors? Are auditors trying to col-
lect evidence that the company is complying with specific regulations? Are auditors trying
to test account balances to tie them to the financial statements?
These activities support the functional areas of compliance, fraud detection and inves-
tigation, operational performance, and internal controls for internal audit departments as
well as the financial reporting and risk assessment functions of external audit.

Master the Data


In theory, auditors should have read-only access to enterprise data through a nonproduction
data warehouse. In practice, they make multiple requests for flat files or data extractions
from the IT manager that they then analyze with a software tool, such as Excel or Tableau.
Most audit data are provided in structured or tabular form, such as a spreadsheet file.
Regardless of the source or type, the audit data standards provide a general overview
of the basic data that auditors will evaluate. For example, consider the Sales_Orders table

1
J .-H. Lim, J. Park, G. F. Peters, and V. J. Richardson, “Examining the Potential Benefits of Audit Data
Analytics,” University of Arkansas working paper, 2021.

ISTUDY
Chapter 6 Audit Data Analytics 285

from the standards shown in Exhibit 6-1. An auditor interested in user activity would want
to focus on the Sales_Order_ID, Sales_Order_Date, Entered_By, Entered_Date, Entered_
Time, Approved_By, Approved_Date, Approved_Time, and Sales_Order_Amount_Local
attributes. These may give insight into transactions on unusual dates, such as weekends, or
unusually high volume by specific users.

Field Name Description EXHIBIT 6-1


Elements in the Sales_
Sales_Order_ID Unique identifier for each sales order. This ID may need to be created by Order Table from the
concatenating fields (e.g., document number, document type, and year) to Audit Data Standards
uniquely identify each sales order.
Sales_Order_Document_ID Identification number or code on the sales order.
Sales_Order_Date The date of the sales order, regardless of the date the order is entered.
Sales_Order_Fiscal_Year Fiscal year in which the Sales_Order_Date occurs: YYYY for delimited,
CCYYMMDD fiscal year-end (ISO 8601) for XBRL-GL.
Sales_Order_Period Fiscal period in which the Sales_Order_Date occurs. Examples include
W1–W53 for weekly periods, M1–M12 for monthly periods, and Q1–Q4 for
quarterly periods.
Business_Unit_Code Used to identify the business unit, region, branch, and so on at the level that
financial statements are being audited. Must match a Business_Unit_Code in
the Business_Unit_Listing file.
Customer_Account_ID Identifier of the customer from whom payment is expected or to whom
unused credits have been applied. Must match a Customer_Account_ID in the
Customer_Master_Listing_YYYYMMDD file.
Entered_By User_ID (from User_Listing file) for person who created the record.
Entered_Date Date the order was entered into the system. This is sometimes referred to as
the creation date. This should be a system-generated date (rather than user-
entered date), when possible. This date does not necessarily correspond with
the date of the transaction itself.
Entered_Time The time this transaction was entered into the system. ISO 8601 representing
time in 24-hour time (hhmm) (e.g., 1:00 p.m. is 1300).
Approved_By User ID (from User_Listing file) for person who approved customer master
additions or changes.
Approved_Date Date the entry was approved.
Approved_Time The time the entry was approved. ISO 8601 representing time in 24-hour time
(hhmm) (e.g., 1:00 p.m. is 1300).
Last_Modified_By User_ID (from User_Listing file) for the last person modifying this entry.
Last_Modified_Date The date the entry was last modified.
Last_Modified_Time The time the entry was last modified. ISO 8601 representing time in 24-hour
time (hhmm) (e.g., 1:00 p.m. is 1300).
Sales_Order_Amount_Local Sales monetary amount recorded in the local currency.
Sales_Order_Local_Currency The currency for local reporting requirements. See ISO 4217 coding.
Segment01 Reserved segment field that can be used for profit center, division, fund,
program, branch, project, and so on.
Segment02 See Segment01.
Segment03 See Segment01.
Segment04 See Segment01.
Segment05 See Segment01.

Source: Adapted from https://www.aicpa.org/content/dam/aicpa/interestareas/frc/assuranceadvisoryservices/downloadabledocuments/


auditdatastandards/auditdatastandards.o2c.july2015.pdf, (accessed January 1, 2018).

ISTUDY
286 Chapter 6 Audit Data Analytics

There are also many pieces of data that have traditionally evaded scrutiny, including
handwritten logs, manuals, handbooks, and other paper or text-heavy documentation.
Essentially, manual tasks including observation and inspection are generally areas where
Data Analytics may not apply. While there have been significant advancements in artificial
intelligence, there is still a need for auditors to exercise their judgment, and data cannot
always supersede the auditor’s reading of human behavior or a sense that something may
not be quite right even when the data say it is. At least not yet.
Data may also be found in unlikely places. An auditor may be tasked with determining
whether the steps of a process are being followed. Traditional evaluation would involve the
auditor observing or interviewing the employee performing the work. Now that most pro-
cesses are handled through online systems, an auditor can perform Data Analytics on the
time stamps of the tasks and determine the sequence of approvals in a workflow along with
the amount of time spent on each task. This form of process mining enables insight into pro-
cesses used to diagnose problems and suggest improvements where greater efficiency may
be applied. Likewise, data stored in paper documents, such as invoices received from ven-
dors, can be scanned and converted to tabular data using specialized software. These new
pieces of data can be joined to other transactional data to enable new, thoughtful analytics.
There is an increasing promise of working with unstructured Big Data to provide addi-
tional insight into the economic events being evaluated by the auditors, such as surveillance
video or text from email, but those are still outside the scope of current Data Analytics that
an auditor would develop.

Perform Test Plan


While there are many different tests or models that auditors can incorporate into their
audit procedures, Data Analytics procedures in auditing traditionally are found in
­computer-assisted audit techniques (CAATs). CAATs are automated scripts that can be
used to validate data, test controls, and enable substantive testing of transaction details or
account balances and generate supporting evidence for the audit. They are especially useful
for re-performing calculations, identifying high-risk samples, and performing other analyti-
cal reviews to identify unusual patterns of behavior or unusual items.
Most CAATs are designed to summarize and describe the data being evaluated based on
a predetermined expected outcome. For example, an auditor evaluating an incentive plan
that gives employees bonuses for opening new accounts would evaluate the number of new
accounts by employee and the amount of bonus paid to see if they were aligned. The audi-
tor could look for a count of new accounts by account type, count the number of custom-
ers, evaluate the opening date, and sort the data by employee to show the top-performing
employees. These descriptive analytics summarize activity or master data elements based on
certain attributes. The auditor may select a sample of the accounts to verify that they were
opened and the documentation exists.
Once an auditor has a basic understanding of the data, they then perform ­diagnostic
analytics, which look for correlations or patterns of interest in the data. For example, the
auditor may look for commonalities between the customers’ demographic data and the
employees’ data to see if employees are creating new accounts for fake customers to inflate
their performance numbers. They may also focus on customers who have common attri-
butes like location or account age. Outliers may warrant further investigation by the auditor
as they represent increased risk and/or exposure.
An auditor then performs predictive analytics, where they attempt to find hidden pat-
terns or variables that are linked to abnormal behavior. The auditor uses the variables to
build models that can be used to predict a likely value or classification. In our example, the
predictive model might flag an employee or customer with similar characteristics to other
high-risk employees or customers whenever a new account is opened.

ISTUDY
Chapter 6 Audit Data Analytics 287

Finally, the auditor may generate prescriptive analytics that identify a course of action to
take based on the actions taken in similar situations in the past. These analytics can assist
future auditors who encounter similar behavior. Using artificial intelligence and machine
learning, these analytics become decision support tools for auditors who may lack experience
to find potential audit issues. For example, when a new sales order is created for a customer
who has been inactive for more than 12 months, a prescriptive analytic would allow an auditor
to ask questions about the transaction to learn whether this new account is potentially fake,
whether the employee is likely to create other fake accounts, and whether the account and/
or employee should be suspended or not. The auditor would take the output, apply judgment,
and proceed with what they felt was the appropriate action.
Most auditors will perform descriptive and diagnostic analytics as part of their audit
plan. On rare occasions, they may experiment with predictive and prescriptive analytics
directly. More likely, they may identify opportunities for the latter analytics and work with
data scientists to build those for future use.
Some examples of CAATs and audit procedures related to the descriptive, diagnostic,
predictive, and prescriptive analytics can be found in Exhibit 6-2.

Analytics Type Example CAATs Example Audit Procedures

Descriptive—summarize Age analysis—groups balances by date Analysis of new accounts opened and employee
activity or master Sorting—identifies largest or smallest values and helps bonuses by employee and location.
data based on certain identify patterns Count the number/dollar amount of transactions that
attributes Summary statistics—mean, median, min, max, count, occur outside normal business hours or at the end/
sum beginning of the period.
Sampling—random and monetary unit
Diagnostic—detect Z-score—outlier detection Analysis of new accounts reveals that an agent has an
correlations and patterns t-Tests—a statistical test used to determine if there is unusual number of new accounts opened for customers
of interest and compare a significant difference between the means of two who have been inactive for more than 12 months.
them to a benchmark groups, or two datasets. An auditor assigns an expected Benford’s value
Benford’s law—identifies transactions or users with non- to purchase transactions, then averages them by
typical activity based on the distribution of digits employee to identify employees with unusually large
purchases.
Drill-down—explores the details behind the values
An auditor filters out transactions that are below a
Exact and fuzzy matching—joins tables and identifies
materiality threshold.
plausible relationships
Sequence check—detects gaps in records and duplicates
entries
Stratification—groups data by categories
Clustering—groups records by non-obvious similarities
Predictive—identify Regression—predicts specific dependent values based Analysis of new accounts opened for customers who
common attributes or on independent variable inputs have been inactive for more than 12 months collects
patterns that may be used Classification—predicts a category for a record data that are common to new account opening, such as
to identify similar activity account type, demographics, and employee incentives.
Probability—uses a rank score to evaluate the strength
of classification Predict the probability of bankruptcy and the ability to
continue as a going concern for a client.
Sentiment analysis—evaluates text for positive or
negative sentiment to predict positive or negative Predict probability of fraudulent financial statements.
outcomes Assess and predict management bias, given sentiment
of conference call transcripts.
Prescriptive—recommend What-if analysis—decision support systems Analysis determines procedures to follow when new
action based on Applied statistics—predicts a specific outcome or class accounts are opened for inactive customers, such as
previously observed requiring approval.
Artificial intelligence—uses observations of past actions
actions to predict future actions for similar events

EXHIBIT 6-2 Examples of Audit Data Analytics

ISTUDY
288 Chapter 6 Audit Data Analytics

While many of these analyses can be performed using Excel, most CAATs are built on
generalized audit software (GAS), such as IDEA, ACL, or TeamMate Analytics. The GAS
software has two main advantages over traditional spreadsheet software. First, it enables
analysis of very large datasets. Second, it automates several common analytical routines, so
an auditor can click a few buttons to get to the results rather than writing a complex set of
formulas. GAS is also scriptable and enables auditors to record or program common analy-
ses that may be reused on future engagements.

Address and Refine Results


The models selected by the auditors will generate various results. A sample selection may
give auditors a list of high-risk transactions to evaluate. A segregation of duties analysis may
generate a list of users with too much access. In every case, the auditors should develop
procedures in the audit plan for handling these lists, exceptions, and anomalies. The pro-
cess may be to evaluate documentation related to the sample, review employees engaging in
risky activity, or simply notify the audit committee of irregular behavior.

Communicate Insights
Many analytics can be adapted to create an audit dashboard for measuring risk in transac-
tions or exceptions to control rules, particularly if the firm has adopted continuous auditing.
The primary output of CAATs is evidence that may be used to test management asser-
tions about the processes, controls, and data quality. This evidence is included in the audit
workpapers.

Track Outcomes
The detection and resolution of audit exceptions may be a valuable measure of the effi-
ciency and effectiveness of the internal audit function itself. Additional analytics may track
the number of exceptions over time and the time taken to report and resolve the issues.
For the CAATs involved, a periodic validation process should occur to ensure that they
continue to function as expected.

PROGRESS CHECK
1. Using Exhibit 6-2 as a guide, compare and contrast descriptive and diagnostic
analytics. How might these be used in an audit?
2. In a continuous audit, how would a dashboard help to communicate audit findings
and spur a response?

LO 6-2 DESCRIPTIVE ANALYTICS


Explain basic Now that you’ve been given an overview of the types of CAATs and analytics that are com-
descriptive analytic monly used in an audit, we’ll dive a little deeper into how these analytics work and what
techniques used they generate. Remember that descriptive analytics are useful for sorting and summarizing
in auditing. data to create a baseline for more advanced analytics. These analytics enable auditors to set
a baseline or point of reference for their evaluation. For example, if an auditor can identify
the median value of a series of transactions, they can make a judgment as to how much
higher the larger transactions are and whether they represent outliers or exceptions.
In this and the next few sections, we’ll present some examples of procedures that audi-
tors commonly use to evaluate enterprise data.

ISTUDY
Chapter 6 Audit Data Analytics 289

Aging of Accounts Receivable


Aging of accounts receivable and accounts payable helps determine the likelihood that a
balance will be paid. This substantive test of account balances evaluates the date of an order
and groups it into buckets based on how old it is, typically in 0–30, 31–60, 61–90, and >90
days, or similar. See Exhibit 6-3 for an example. Extremely old accounts that haven’t been
resolved or written off should be flagged for follow-up by the auditor. It could mean that
(1) the data are bad, (2) a process is broken, (3) there’s a reason someone is holding that
account open, or (4) it was simply never resolved.

0–30 31–60 61–90 >90


EXHIBIT 6-3
Aging of Accounts
Total 154,322 74,539 42,220 16,900 Receivable

There are many ways to calculate aging in Excel, including the use of pivot tables. If you
have a simple list of accounts and balances, you can calculate a simple age of accounts in
Excel using the following procedure.

Sorting
Sometimes, simply viewing the largest or smallest values can provide meaningful insight.
Sorting in ascending order shows the smallest number values first. Sorting in descending
order shows the largest values first. The type of data that lends itself to sorting is any numer-
ical, date, or text data of interest.

Summary Statistics
Summary statistics provide insight into the relative size of a number compared with the
population. The mean indicates the average value, while the median produces the middle
value when all the transactions are lined up in a row. The min shows the smallest value,
while the max shows the largest. Finally, a count tells how many records exist, where the
sum adds up the values to find a total. Once summary statistics are calculated, you have a
reference point for an individual record. Is the amount above or below average? What per-
centage of the total does a group of transactions make up? Excel’s Data Analysis Toolpak
offers an easy option to show summary statistics for underlying data.
The type of data that lends itself to calculating summary statistics is any numerical data,
such as a dollar amount of quantity. Summary statistics are limited for categorical data;
however, you can still calculate proportions and counts of groups in categorical data.

Lab Connection
Lab 6-1 has you identify summary statistics, trend lines, and clustering to
explore your audit data.

Sampling
Sampling is useful when you have manual audit procedures, such as testing transaction
details or evaluating source documents. The idea is that if the sample is an appropriate
size, the features of the sample can be confidently generalized to the population. So, if the
sample has no errors (misstatement), then the population is unlikely to have errors as well.
Of course, sampling has its limitations. The confidence level is not a guarantee that you

ISTUDY
290 Chapter 6 Audit Data Analytics

won’t miss something critical like fraud. But it does limit the scope of the work the auditor
must perform.
To determine appropriate sample size, there are three determinants, including desired
confidence level, tolerable misstatement, and estimated misstatement.

Lab Connection
Lab 6-4 has you pull a random sample of a large dataset.

PROGRESS CHECK
3. What type of descriptive analytics would you use to find negative numbers that
were entered in error?
4. How does the computation of summary statistics (mean, mode, median, max,
min, etc.) meet the definition of descriptive analytics?

LO 6-3 DIAGNOSTIC ANALYTICS


Define and Diagnostic analytics provide more details into not just the records, but also records or
describe groups of records that have some standout features that might be different from what might
diagnostic be expected. They may be significantly larger than other values, may not match a pattern
analytics that are within the population, or may be a little too similar to other records for an auditor’s liking.
used in auditing. Diagnostic analytics will yield stronger accounting evidence and greater assurance in
order to discover the root causes of issues, such as drill-down, identification of anomalies
and fraud, and data discovery by profiling, clustering, similarity matching, or co-occurrence
grouping techniques. In essence, diagnostic analytics goes hand in hand with the primary
objective of financial statement audits, that of detecting misstatements.
Here we identify some types of common diagnostic analytics.

Box Plots and Quartiles


A key component of diagnostic analytics is to identify outliers or anomalies. The median
and quartile ranges provide a quick analysis of numerical data. It provides a simple metric
for evaluating the spread of the data and focusing auditors’ attention on the data points on
the edges—the outliers. Box plots enable the visualization of quartile ranges.

Z-score
The Z-score, introduced in Chapter 3, assigns a value to a number based on how many stan-
dard deviations it stands from the mean, shown in Exhibit 6-4. By setting the mean to 0, you
can see how far a point of interest is above or below it. For example, a point with a Z-score
of 2.5 is two-and-a-half standard deviations above the mean. Because most values that come
from a large population tend to be normally distributed (frequently skewed toward smaller
values in the case of financial transactions), nearly all (99 percent) of the values should
be within plus-or-minus three standard deviations. If a value has a Z-score of 3.9, it is very
likely an anomaly or outlier that warrants scrutiny and potentially additional analysis.

t-Tests
A t-test is used as a statistic to determine if there is a significant difference between the means
of two groups, or two datasets. A t-test allows the auditor to compare the average values of

ISTUDY
Chapter 6 Audit Data Analytics 291

Point of EXHIBIT 6-4


The Z table will provide Interest Z-scores
the percentage of data The Z-score shows the
Mean
points that will fall to
relative position of a
the left of our point of
interest (shaded area point of interest to the
under the curve) population.
Source: http://www
.dmaictools.com/wp-content/
In this case, Z = uploads/2012/02/z-definition
1.2, and the .jpg
area to the left
of Z is 0.88, or Z Scale
88% of all
theoretical data −3σ −2σ −1σ 0σ 1σ 2σ 3σ
points

the two datasets and determine if they came from the same population. The t-test takes a
sample from each of the two datasets and establishes the problem statement by assuming
a null hypothesis that the two means are equal. As part of the test, certain values are calcu-
lated, t-statistics are computed and compared against the standard values, and the assumed
null hypothesis is either rejected or unable to be rejected. In an audit setting, a t-test may be
used to determine if there is a difference in the mean travel and expense reports of certain
executives as compared to other executives. A t-test may also be used to test a hypothesis
regarding related party transactions, by testing whether related party transactions are statisti-
cally different in magnitude than similar transactions from independent parties.

Data Analytics at Work

Do Auditors Need to Be Programmers?


The days of when auditors could avoid the computer or technology are long
since gone. Because data specialists will often handle the programming
scripts to access the data, it is not necessary for auditors to be programmers.
However, auditors will need to be conversant on the subject of program-
ming so they can describe what analytics of a set of given transactions need
to accomplish, applying the critical thinking skills needed to improve the
audit, potentially leading to useful information for the client.
Ed Wilkins, a Deloitte partner who leads the company’s audit analytics,
notes, “That’s what we’re looking for [auditors to] get enough of that experi-
ence so they can sit back and critically think about that dataset. What stories
should be coming out of the data? How do I get that information, and now
be conversant with the data specialist who is now part of our core audit
team to be able to tell that story?”
In this chapter, we emphasize what analytics, particularly diagnostic ana-
lytics, are used most by auditors.
Source: “How Firms Are Delivering Value with Audit Data Analytics,” Journal of Accountancy,
January 21, 2020, https://www.journalofaccountancy.com/news/2020/jan/cpa-firm-value-
audit-data-analytics-22751.html (accessed March 10, 2021).

ISTUDY
292 Chapter 6 Audit Data Analytics

Benford’s Law
Benford’s law states that when you have a large set of naturally occurring numbers, the
leading digit(s) is (are) more likely to be small. The economic intuition behind it is that
people are more likely to make $10, $100, or $1,000 purchases than $90, $900, or $9,000
purchases. According to Benford’s law, more numbers in a population of numbers start
with 1 than any other digit, followed by those that begin with 2, then 3, and so on (as shown
in Exhibit 6-5). This law has been shown in many settings, such as the amount of electricity
bills, street addresses, and GDP figures from around the world.

EXHIBIT 6-5 35%


Benford’s Law
Benford’s law predicts 30%
the distribution of first
digits. 25%

20%

15%

10%

5%

0%
1 2 3 4 5 6 7 8 9
Purchases GDP 2016 Benford’s Predicted

In auditing, we can use Benford’s law to identify transactions or users with nontypical
activity based on the distribution of the first digits of the number. For example, assume that
purchases over $500 require manager approval. A cunning employee might try to make large
purchases that are just under the approval limit to avoid suspicion. She will even be clever and
make the numbers look random: $495, $463, $488, and so on. What she doesn’t realize is that
the frequency of the leading digit 4 is going to be much higher than it should be, shown in
Exhibit 6-6. Benford’s law can also detect random computer-generated numbers because those
will have equally distributed first digits. Adding additional leading digits refines the analysis.
EXHIBIT 6-6 35%
Using Benford’s Law
Structured purchases 30%
may look normal,
but they alter the 25%
distribution under
Benford’s law. 20%

15%

10%

5%

0%
1 2 3 4 5 6 7 8 9
Purchases Benford’s Predicted

ISTUDY
Chapter 6 Audit Data Analytics 293

Data that lend themselves to applying Benford’s law tend to be large sets of numerical
data, such as monetary amounts or quantities.
Once an auditor uses Benford’s law to identify a potential issue, they can further ana-
lyze individual groupings of transactions, for example, by individual or location. An auditor
would append the expected probability of each transaction, for example, transactions start-
ing with 1 are assigned 0.301 or 30.1 percent. Then the auditor would calculate the average
estimated Benford’s law probability over the group of transactions. Given that the expected
average of all Benford’s law probabilities is 0.111 or 11.1 percent, individuals with an aver-
age estimated value over 11.1 percent would tend to have transactions with more leading
digits of 1, 2, 3, and so on. Conversely, individuals with an average estimated value below
11.1 percent would tend to have transactions with more leading digits of 7, 8, 9, and so on.
This analysis allows auditors to narrow down the individual employees, customers, depart-
ments, or locations that may be engaging in abnormal transactions.
While Benford’s law can apply to large sets of transactional data, it doesn’t hold for
purposefully assigned numbers, such as sequential numbers on checks, account numbers,
or other categorical data. However, auditors have found similar patterns when applying
Benford’s law to text, looking at the first letter of words used in financial statement com-
munication or email messages.

Lab Connection
Lab 6-2 and Lab 6-5 have you perform a Benford’s Law analysis to identify
outlier data.

Drill-Down
The most modern Data Analytics software allows auditors to drill down into specific values
by simply double-clicking a value. This lets you see the underlying transactions that gave
you the summary amount. For example, you might click the total sales amount in an income
statement to see the sales general ledger summarizing the daily totals. Click a daily amount
to see the individual transactions from that day.

Exact and Fuzzy Matching


Matching in CAAT is used to link records, join tables, and find potential issues. Auditors
use exact matching to join database tables with a foreign key from one table to the primary
key of another. In cases where the data are inconsistent or contain user-generated informa-
tion, such as addresses, exact matches may not be sufficient. For example, “234 Second
Avenue” and “234 Second Ave” are not the same value. To join tables on these values,
auditors will use a fuzzy match based on the similarity of the values. The auditor defines a
threshold, such as 50 percent, and if the values share enough common characters, they will
be matched. Increasing the threshold can reduce the number of false positives (i.e., fields
that are not the same, but are identified as such), but may also, at the same time, increase
the number of false negatives (i.e., true matches that weren’t identified).
Note that not all matches are the same. Using queries and other database management
tools, auditors may want only certain records, such as those that match or those that don’t
match. These matches require the use of certain join types. Inner Join will show only the
records from both tables that match and exclude everything that doesn’t match. Left Join
will show all records from the first table and only records from the second table that match.
Right Join will show all records from the second table and records from the first table that
match. Outer Join will show all nonmatching ones. Full Outer Join will show all records,

ISTUDY
294 Chapter 6 Audit Data Analytics

including matching and nonmatching ones. Fuzzy matching finds matches that may be
less than 100 percent matching by finding correspondences between portions of the text or
other entries.
Fuzzy matching data requires that you have two tables/sheets with a common attribute,
such as a primary key/foreign key, name, or address.

Lab Connection
Lab 6-3 has filter data to identify duplicate payments based on matching
values.

Sequence Check
Another substantive procedure is the sequence check. This is used to validate data integ-
rity and test the completeness assertion, making sure that all relevant transactions are
accounted for. Simply put, sequence checks are useful for finding gaps, such as a missing
check in the cash disbursements journal, or duplicate transactions, such as duplicate pay-
ments to vendors. This is a fairly simple procedure that can be deployed quickly and easily
with great success. Begin by sorting your data by identification number.

Stratification and Clustering


There are several approaches to grouping transactions or individuals. In most cases, the
items can be grouped by similar characteristics or strata. With stratification, the auditor
identifies specific groups, such as geographic location or functional area, that can be used
to simplify their analysis. When similarities are less obvious, such as personal preference or
expressed behavior, clustering may be used to infer these groupings. Both stratification and
clustering are generally used for data exploration, rather than substantive testing. The iden-
tification of these groupings, whether obvious or not, helps narrow the scope of the audit
and focuses on risk. Clustering is discussed in depth in Chapter 3.

PROGRESS CHECK
5. A sequence check will help us to see if there is a duplicate payment to vendors.
Why is that important for the auditor to find?
6. Let’s say a company has nine divisions, and each division has a different check
number based on its division—so one starts with “1,” another with “2,” and so on.
Would Benford’s law work in this situation?

ADVANCED PREDICTIVE AND PRESCRIPTIVE


LO 6-4 ANALYTICS IN AUDITING
Characterize Predictive and prescriptive analytics provide less deterministic output than the previous
the predictive analytics. This is because we’re moving away from deterministic values to more probabi-
and prescriptive listic models, judging things like likelihood and possibility. Here we’ll briefly discuss some
analytics used in applications of these different concepts, but we refer you back to Chapter 3 for background
auditing. information.

ISTUDY
Chapter 6 Audit Data Analytics 295

Regression
Regression allows an auditor to predict a specific dependent value based on various inde-
pendent variable inputs. In other words, based on the level of various independent variables,
we can establish an expectation for the dependent variable and see if the actual outcome is
different from that predicted. In auditing, for example, we could evaluate overtime booked
for workers against productivity or the value of inventory shrinkage given environmental
factors.
Regression might also be used for the auditor to predict the level of their client’s allow-
ance for doubtful accounts receivable, given various macroeconomic variables and bor-
rower characteristics that might affect the client’s customers’ ability to pay amounts owed.
The auditor can then assess if the client’s allowance is different from the predicted amount.

Classification
Classification in auditing is going to be mainly focused on risk assessment. The predicted
classes may be low risk or high risk, where an individual transaction is classified in either
group. In the case of known fraud, auditors would classify those cases or transactions as
fraud/not fraud and develop a classification model that could predict whether similar trans-
actions might also be potentially fraudulent.
There is a longstanding classification method used to predict whether a company is
expected to go bankrupt or not. Altman’s Z is a calculated score that helps predict bank-
ruptcy and might be useful for auditors to evaluate a company’s ability to continue as a
going concern.2 Beneish (1999) also has developed a classification model predicting firms
that have committed financial statement fraud, that might be used by auditors to help detect
fraud in their financial statement audits.3
When using classification models, it is important to remember that large training sets
are needed to generate relatively accurate models. Initially, this requires significant manual
classification by the auditors or business process owner so that the model can be useful for
the audit.

Probability
When talking about classification, the strength of the class can be important to the auditor,
especially when trying to limit the scope (e.g., evaluate only the 10 riskiest transactions).
Classifiers that use a rank score can identify the strength of classification by measuring the
distance from the mean. That rank order focuses the auditor’s efforts on the items of poten-
tially greatest significance, where additional substantive testing might be needed.

Sentiment Analysis
By evaluating text of key financial documents (e.g., 10-K or annual report), an auditor could
see if positive or negative sentiment in the text is predictive of positive or negative out-
comes. Such analysis may reveal a bias from management. Is management trying to influ-
ence current or potential investors by the sentiment expressed in their financial statements,
management discussion, analysis (as part of their 10-K filing), or in conference calls? Is
management too optimistic or pessimistic because they have stock options or bonuses on
the line?

2
 dward Altman, “Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy,”
E
The Journal of Finance 23, no. 4 (1968), pp. 589–609.
3
M. D. Beneish, “The Detection of Earnings Manipulation,” Financial Analysts Journal 55, no. 5 (1999),
pp. 24–36.

ISTUDY
296 Chapter 6 Audit Data Analytics

Such sentiment analysis may affect an auditor’s assessment of audit risk or alert them to
management’s aggressiveness. There is more discussion on sentiment analysis in Chapter 8.

Applied Statistics
Additional mixed distributions and nontraditional statistics may also provide insight to the
auditor. For example, an audit of inventory may reveal errors in the amount recorded in
the system. The difference between the error amounts and the actual amounts may provide
some valuable insight into how significant or material the problem may be. Auditors can
plot the frequency distribution of errors and use Z-scores to hone in on the cause of the
most significant or outlier errors.

Artificial Intelligence
As the audit team generates more data and takes specific action, the action itself can be
modeled in a way that allows an algorithm to predict expected behavior. Artificial intel-
ligence is designed around the idea that computers can learn about action or behavior from
the past and predict the course of action for the future. Assume that an experienced auditor
questions management about the estimate of allowance for doubtful accounts. The human
auditor evaluates a number of inputs, such as the estimate calculation, market factors,
and the possibility of income smoothing by management. Given these inputs, the auditor
decides to challenge management’s estimate. If the auditor consistently takes this action
and it is recorded by the computer, the computer learns from this action and makes a rec-
ommendation when a new inexperienced auditor faces a similar situation.
Decision support systems that accountants have relied upon for years (e.g., TurboTax)
are based on a formal set of rules and then updated based on what the user decides given
several choices. Artificial intelligence can be used as a helpful assistant to auditors and may
potentially be called upon to make judgment decisions itself.

Additional Analyses
The list of Data Analytics presented in this chapter is not exhaustive by any means. There
are many other approaches to identifying interesting patterns and anomalies in enterprise
data. Many ingenious auditors have developed automated scripts that can simplify several
of the audit tasks presented here. Excel add-ins like TeamMate Analytics provide many
different techniques that apply specifically to the audit of fixed assets, inventory, sales and
purchase transactions, and so on. Auditors will combine these tools with other techniques,
such as periodically testing the effectiveness of automated tools by adding erroneous or
fraudulent transactions, to enhance their audit process.

PROGRESS CHECK
7. Why would a bankruptcy prediction be considered classification? And why would
it be useful to auditors?
8. If sentiment analysis is used on a product advertisement, would you guess the
overall sentiment would be positive or negative?

ISTUDY
Summary
This chapter discusses a number of analytical techniques that auditors use to gather insights
about controls and transaction data. (LO 6-1)
These include:
■ Descriptive analytics that are used to summarize and gain insight into the data. For
example, by analyzing the aging of accounts receivable through descriptive analytics an
auditor can determine if they are fairly stated. (LO 6-2)
■ Diagnostic analytics that identify patterns in the data that may not be immediately obvi-
ous. For example, Benford’s law allows you to look at a naturally occurring set of num-
bers and identify if there are potential anomalies. (LO 6-3)
■ Predictive analytics that look for common attributes of problematic data to help identify
similar events in the future. For example, using classification an auditor can better pre-
dict if a firm is more closely related to firms that have historically gone bankrupt versus
those that have not. This may influence the auditor’s decision of where to audit and
whether there are going concern issues. (LO 6-4)
■ Prescriptive analytics that provide decision support to auditors as they work to resolve
issues with the processes and controls. For example, decision support systems allow
auditors to employ rules to scenarios for more thorough and objective guidance in their
auditing. (LO 6-4)

Key Words
Benford’s law (292) The principle that in any large, randomly produced set of natural numbers, there is
an expected distribution of the first, or leading, digit with 1 being the most common, 2 the next most, and
down successively to the number 9.
computer-assisted audit techniques (CAATs) (286) Automated scripts that can be used to validate
data, test controls, and enable substantive testing of transaction details or account balances and generate
supporting evidence for the audit.
descriptive analytics (286) Procedures that summarize existing data to determine what has happened
in the past. Some examples include summary statistics (e.g., Count, Min, Max, Average, Median), distribu-
tions, and proportions.
diagnostic analytics (286) Procedures that explore the current data to determine why something has
happened the way it has, typically comparing the data to a benchmark. As an example, these allow users to
drill down in the data and see how they compare to a budget, a competitor, or trend.
fuzzy matching (287) Process that finds matches that may be less than 100 percent matching by finding
correspondences between portions of the text or other entries.
predictive analytics (286) Procedures used to generate a model that can be used to determine what is
likely to happen in the future. Examples include regression analysis, forecasting, classification, and other
predictive modeling.
prescriptive analytics (287) Procedures that work to identify the best possible options given constraints
or changing conditions. These typically include developing more advanced machine learning and artificial
intelligence models to recommend a course of action, or optimize, based on constraints and/or changing
conditions.
process mining (286) Analysis technique of business processes used to diagnose problems and suggest
improvements where greater efficiency may be applied.
t-test (290) A statistical test used to determine if there is a significant difference between the means of
two groups, or two datasets.

297

ISTUDY
ANSWERS TO PROGRESS CHECKS
1. Descriptive analytics summarize activity by computing basic descriptive statistics such
as means, medians, minimums, maximums, and standard deviations. Diagnostic analytics
compare variables or data items to each other and try to find co-occurrence or correla-
tion to find patterns of interest. Both of these approaches look at historic data. An auditor
might use descriptive analytics to understand what they are auditing and diagnostic ana-
lytics to determine whether there is risk of misstatement based on the expected value or
why the numbers are they way they are.
2. Use of a dashboard to highlight and communicate findings will help identify alarms for
issues that are occurring on a real-time basis. This will allow issues to be addressed
immediately.
3. By computing minimum values or by sorting, you can find the lowest reported value and,
thus, potential negative numbers that might have been entered erroneously into the sys-
tem and require further investigation.
4. Descriptive analytics address the questions of “What has happened?” or “What is happen-
ing?” Summary statistics (mean, mode, median, max, min, etc.) give the auditor a view of
what has occurred that may facilitate further analysis.
5. Duplicate payments to vendors suggest that there is a gap in the internal controls around
payments. After the first payment was made, why did the accounting system allow a sec-
ond payment? Were both transactions authorized? Who signed the checks or authorized
payments? How can we prevent this from happening in the future?
6. Benford’s law works best on naturally occurring numbers. If the company dictates the first
number of its check sequence, Benford’s law will not work the same way and thus would
not be effective in finding potential issues with the check numbers.
7. Bankruptcy prediction predicts two conditions for a company: bankrupt or not bankrupt.
Thus, it would be considered a classification activity. Auditors are required to assess a cli-
ent’s ability to continue as a going concern and the bankruptcy prediction helps with that.
8. Most product advertisements are very positive in nature and would have positive
sentiment.

Multiple Choice Questions


®

1. (LO 6-1) Which items would be currently out of the scope of Data Analytics?
a. Direct observation of processes
b. Evaluation of time stamps to evaluate workflow
c. Evaluation of phantom vendors
d. Duplicate payment of invoices
2. (LO 6-2) Which audit technique is used to test completeness?
a. Benford’s law
b. Sequence check
c. Summary statistics
d. Drill-down

298

ISTUDY
3. (LO 6-3) Benford’s law suggests that the first digit of naturally occurring numerical data-
sets follow an expected distribution where:
a. the leading digit of 4 is more common than 3.
b. the leading digit of 9 is more common than 2.
c. the leading digit of 8 is more common than 9.
d. the leading digit of 6 is more common than 5.
4. (LO 6-1) The determinants for sample size include all of the following except:
a. confidence level.
b. tolerable misstatement.
c. potential risk of account.
d. estimated misstatement.
5. (LO 6-1) CAATs are automated scripts that can be used to validate data, test controls,
and enable substantive testing of transaction details or account balances and generate
supporting evidence for the audit. What does CAAT stand for?
a. Computer-aided audit techniques
b. Computer-assisted audit techniques
c. Computerized audit and accounting techniques
d. Computerized audit aids and tests
6. (LO 6-2, 6-3, 6-4) Which type of audit analytics might be used to find hidden patterns or
variables linked to abnormal behavior?
a. Prescriptive analytics
b. Predictive analytics
c. Diagnostic analytics
d. Descriptive analytics
7. (LO 6-3) What describes finding correspondences between at least two types of text or
entries that may not match perfectly?
a. Incomplete linkages
b. Algorithmic matching
c. Fuzzy matching
d. Incomplete matching
8. (LO 6-4) Which testing approach would be used to predict whether certain cases should
be evaluated as having fraud or no fraud?
a. Classification
b. Probability
c. Sentiment analysis
d. Artificial intelligence
9. (LO 6-4) Which testing approach would be useful in assessing the value of inventory
shrinkage given multiple environmental factors?
a. Probability
b. Sentiment analysis
c. Regression
d. Applied statistics
10. (LO 6-3) What type of analysis would help auditors find missing checks?
a. Sequence check
b. Benford’s law analysis
c. Fuzzy matching
d. Decision support systems

299

ISTUDY
Discussion and Analysis
®

1. (LO 6-1) How do nature, extent, and timing of audit procedures help us identify when to
apply Data Analytics to the audit process?
2. (LO 6-1) When do you believe Data Analytics will add value to the audit process? How
can it most help?
3. (LO 6-3) Using Table 6-2 as a guide, compare and contrast predictive and prescriptive
analytics. How might these be used in an audit? Or a continuous audit?
4. (LO 6-4) Prescriptive analytics rely on models based on past actions to suggest recom-
mended actions for new, similar situations. For example, auditors might review manag-
ers’ approval of new credit applications for inactive customers. If auditors know the
variables and values that were common among past approvals and denials, they could
compare the action recommended by the model with the response of the manager.
How else might this prescriptive analytics help auditors assess risk or test audit issues?
5. (LO 6-2) One type of descriptive analytics is simply sorting data. Why is seeing extreme
values helpful (minimums, maximums, counts, etc.) in evaluating accuracy and com-
pleteness and in potentially finding errors and fraud and the like?

Problems
®

1. (LO 6-1) Match the analytics type (descriptive, diagnostic, predictive, or prescriptive) to
each analytics technique.

Analytics Technique Analytics Type


1. Classification using Altman’s Z to assess probability of bankruptcy
2. Benford’s law
3. Sentiment analysis
4. Fuzzy matching
5. Summary statistics
6. Drill-down techniques to understand ­underlying detail
7. What-if analysis

2. (LO 6-1) Match the analytics type (descriptive, diagnostic, predictive, or prescriptive) to
each analytics technique.

Analytics Technique Analytics Type


1. Classification
2. Sorting
3. Regression
4. t-Tests
5. Sequence check
6. Artificial intelligence

3. (LO 6-3) Match the diagnostic analytics questions to the following diagnostic analytics
techniques.
• Z-score
• t-Test
• Benford’s law
• Drill-down

300

ISTUDY
• Fuzzy matching
• Sequence check

Diagnostic Analytics Questions Diagnostic Analytics Technique


1. Which checks are missing in our records?
2. Which transaction amounts are potentially
outliers?
3. Is the mean of one group of transactions
different than another group?
4. Is the address of the vendor similar to the
address of any of our employees?
5. What individual transactions are summa-
rized in the total?
6. Are the numbers in these transactions
consistent with expectation?

4. (LO 6-3) Match the analytics question to the following predictive and prescriptive ana-
lytics techniques.
• Classification
• Regression
• Probability
• Sentiment analysis
• What-if analysis
• Artificial Intelligence

Analytics Question Predictive or Prescriptive? Analysis ­Technique


1. Which companies are predicted to go
bankrupt and which are not?
2. Out of the group classified to go
bankrupt, which ones have the highest
likelihood of going bankrupt?
3. How can an auditor leverage algorithms
to quickly identify and understand areas
of risk?
4. Does management exhibit bias in the
management, discussion, and analysis
section of the financial statements?
5. If I cannot rely on internal controls, what
additional substantive testing do I need
to perform?

5. (LO 6-2) One type of descriptive analytics is age analysis, which shows how old open
accounts receivable and accounts payable are. How would age analysis be useful in the
following situations? Select whether these situations would facilitate Account manage-
ment, Continuous auditing, or Manual auditing.

Situation Type
1. Auditors receive alerts when aging buckets
go outside a set range.
2. Auditor tests the aging values and deter-
mines whether they are appropriate.
3. Owners determine whether certain ac-
counts should be written off.

301

ISTUDY
6. (LO 6-2) Analysis: Why are auditors particularly interested in the aging of accounts
receivable? How does this analysis help evaluate management judgment on collectabil-
ity of receivables? Would a dashboard item reflecting this aging be useful in a continu-
ous audit?
7. (LO 6-1) Analysis: One of the benefits of Data Analytics is the ability to see and test the
full population. In that case, why is sampling still used, and how is it useful?
8. (LO 6-3) Analysis: How is a Z-score greater than 3.0 (or –3.0) useful in finding extreme
values? What type of analysis should we do when we find extreme or outlier values?
9. (LO 6-3) What are some patterns that could be found using diagnostic analysis?

Select all that apply


1. Missing transactions
2. Employees structuring payments to get around approval limits
3. Employees creating fictitious vendors
4. A predicted fraudulent transaction
5. The amount of inventory shrinkage
6. The bias of an annual report
7. Duplicate transactions

10. (LO 6-2, 6-3) In a certain company, one accountant records most of the adjusting jour-
nal entries at the end of the month. What type of analysis could be used to identify that
this happens, as well as the cumulative size of the transactions the accountant records?

Select all that apply


1. Classification
2. Fuzzy matching
3. Probability
4. Regression
5. Drill-down
6. Benford’s law
7. Z-score
8. Sequence check
9. Stratification

11. (LO 6-3) Which distributions would you recommend be tested using Benford’s law?

Select all that apply


1. Accounts payable transactions
2. Vendor numbers
3. Sales transactions
4. Employee numbers
5. Cash disbursements

12. (LO 6-3) Analysis: What would a Benford’s law evaluation of sales transaction amounts
potentially show? What would a test of vendor numbers or employee numbers show?
Anything different from a test of invoice or check numbers? Are there any cases where
Benford’s law wouldn’t work?

302

ISTUDY
13. (LO 6-4) Which of the following methods illustrate the use of artificial intelligence to
evaluate the allowance for doubtful accounts?

Select all that apply


1. Calculating a percentage of accounts ­receivable
2. Calculating a percentage of sales revenue
3. Calculating a value based on similar ­companies’ allowance accounts
4. Calculating an outcome based on past auditors’ decisions

14. (LO 6-4) Analysis: How could artificial intelligence be used to help with the evalua-
tion of the estimate for the allowance for doubtful accounts? Could past allowances be
tested for their predictive ability that might be able to help set allowances in the current
period?
15. (LO 6-4) Analysis: How do you think sentiment analysis of the 10-K might assess the
level of bias (positive or negative) of the annual reports? If management is too positive
about the results of the company, can that be viewed as being neutral or impartial?

303

ISTUDY
LABS ®

Lab 6-1 Evaluate Trends and Outliers—Oklahoma


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As an internal auditor, you are tasked with evaluating audit objectives
and identifying areas of high risk. This involves identifying potential exposure to fraudulent
purchases and analyzing trends and outliers in behavior. In this case you are looking specifi-
cally at the purchasing transactions for the State of Oklahoma by business unit.
Data: Lab 6-1 OK PCard ADS.xlsx - 29MB Zip / 30MB Excel

Lab 6-1 Example Output


By the end of this lab, you will create a dashboard that will let you explore purchasing data
as an auditor. While your results will include different data values, your work should look
similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 6-1M Example of PCard Audit Dashboard in Microsoft Power BI Desktop

304

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 6-1T Example PCard Audit Dashboard in Tableau Desktop

Lab 6-1 Part 1 Identify Purchase Trends over Time


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-1 [Your name] [Your email address].docx.
The first step of your analysis will have you summarize purchases by supplier group to
identify large contracts and negative values, and by date to identify transactions that take
place outside of normal weekdays. Each of these help to determine risk.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and load your data:


a. Click Home > Connect to Data > Excel.
b. Locate the Lab 6-1 OK PCard ADS.xlsx file on your computer and click
Open.
c. Choose Lab_6-1_OK_PCard_Data and click Load.

305

ISTUDY
Rev. Confirming Pages

2. Go to Page 1 and create the following visualizations:


a. Rename Page 1 tab to Dashboard.
b. Add three slicers to act as filters and move them to the right side of the
screen. In the Fields pane, drag the following values to the appropriate
fields, then click the drop-down menu next to each to set the summary
measure (e.g., Sum, Average, Count):
1. Field: Business_Unit_Description
a. Click the menu in the top corner of the slicer and choose ­Dropdown.
b. Choose DEPARTMENT OF TRANSPORTATION from the
­Business_Unit_Description list.
2. Field: Purchase_Order_Date > Date Hierarchy > Month
a. Click the menu in the top corner of the slicer and choose
­Dropdown.
b. Choose August from the Month list.
3. Field: Purchase_Order_Date > Date Hierarchy > Day
a. Click the menu in the top corner of the slicer and choose
­Between.
b. Enter 1 to 31 in the Day text boxes or drag the sliders.
c. Add a stacked bar chart of Total Purchases by Supplier Group sorted in
descending order by purchase amount and move it to the top-left corner
of the page:
1. X-axis: Purchase_Order_Amount_Local > Sum
2. Y-Axis: Supplier_Group
3. Tooltips: Purchase_Order_ID > Count
4. Click the Format visual (paintbrush) icon:
a. Data labels: On
b. Bars > Colors: Click FX, enter the following, and click OK.
i. Based on field: Purchase_Order_ID
ii. Summarization: Count
iii. Check Add a middle color.
iv. Minimum: Lowest value > Orange
v. Center: Custom: 10 > Yellow
vi. Maximum: Highest value > Blue
vii. Click OK to return to your Dashboard
5. Take a screenshot (label it 6-1MA).
d. Add a line chart of Total Purchases by Day showing the purchase
amount and quantity for each day of the filtered month and move it to
the top-right corner of the page to the left of your slicers:
1. X-Axis: Purchase_Order_Date > Date Hierarchy > Day
2. Y-Axis: Purchase_Order_Amount_Local > Sum
3. Secondary Y-Axis: Purchase_Order_ID > Count
e. Add a table with the Purchase Details showing the transaction details and
move it to the bottom of your page so it fills the full width of the page:

306

ISTUDY ric44907_ch06_282-333.indd 306 02/24/23 10:28 AM


Rev. Confirming Pages

1. Columns:
a. Purchase_Order_ID
b. Purchase_Order_Date > Click the drop-down and select 
Purchase_Order_Date.
c. Entered_By
d. Supplier_Account_Name
e. Supplier_Group
f. Purchase_Order_Amount_Local
f. Finally, use the Format visual (paintbrush) icon to give each visual a
friendly Title, X-axis title, Y-axis title, and Legend Name.
3. Take a screenshot (label it 6-1MB).
4. When you are finished answering the lab questions, continue to the next part.
Save your file as Lab 6-1 OK PCard Audit Dashboard.pbix.

Tableau | Desktop

1. Open Tableau Desktop and load your data:


a. Click Connect to a File and choose Excel.
b. Navigate to your Lab 6-1 OK PCard ADS.xlsx and click Open.
2. On the Data Source tab, click Update Now to preview your data and verify
that it loaded correctly.
3. Starting on Sheet1, create the following visualizations (each on a separate
sheet):
a. Create a distribution of Total Purchases by Supplier Group sorted in
descending order by purchase amount:
1. Columns: SUM(Purchase Order Amount Local)
2. Rows: Supplier Group > Sort descending
3. Marks > Label: Purchase Order Amount Local > Sum
4. Marks > Color: Purchase Order ID > Measure > Count
a. Click Add all members, if prompted.
b. Right-click and choose Measure > Count.
c. Click the CNT(Purchase Order ID) color card menu and choose
Edit Colors. . .
i. Palette: Orange-Blue Diverging
ii. Check Use Full Color Range.
iii. Click Advanced >>.
iv. Check Center and enter 10, then click OK.
5. Filter: Business Unit Description
a. Click Use all and click OK.
b. Right-click the Business Unit Description filter pill and choose
Show filter.

307

ISTUDY ric44907_ch06_282-333.indd 307 02/24/23 10:26 AM


c. Click the Business Unit Description filter card menu and choose
Single value (dropdown).
d. Click the Business Unit Description filter card menu and choose
Apply to Worksheets > All Using this Data Source.
e. Choose DEPARTMENT OF TRANSPORTATION from the Busi-
ness Unit Description filter list.
6. Filter: Purchase Order Date > Month / Year, click Next.
a. Click Use all and click OK.
b. Right-click the MY(Purchase Order Date) filter pill and choose
Show filter.
c. Click the MY(Purchase Order Date) filter card menu and choose
Single value (dropdown).
d. Click the MY(Purchase Order Date) filter card menu and choose
Apply to Worksheets > All Using this Data Source.
e. Choose August 2019 from the MY(Purchase Order Date)
­f ilter list.
7. Take a screenshot (label it 6-1TA).
b. Create a dual line chart with the Total Purchases by Day:
1. Columns: DAY(Purchase Order Date)
2. Rows: SUM(Purchase Order Amount Local); CNT(Purchase
­Order ID)
3. Show the Business Unit Description and Purchase Order Date filter
cards.
4. If your visual defaults to two separate line charts, change the
­visualization to a Dual Line Chart in the Show Me tab.
c. Create a table with the Purchase Details showing the transaction
details:
1. Rows:
a. Purchase_Order_ID
b. Purchase_Order_Date > Attribute
c. Entered_By
d. Supplier_Account_Name
e. Supplier_Group
2. Marks > Text:
a. Purchase_Order_Amount_Local
4. Create a new Dashboard tab called Purchase Trends tab.
a. Drag each of the three visualizations you created into your dashboard
from the pane on the left. Place Total Purchase by Supplier Group in
the top-left corner, Total Purchases by Day in the top-right corner, and
Purchase Details along the entire bottom.
5. Take a screenshot (label it 6-1TB) of the All tables sheet.
6. When you are finished answering the lab questions, continue to the next
part. Save your file as Lab 6-1 OK PCard Audit Dashboard.twb.

308

ISTUDY
Rev. Confirming Pages

Lab 6-1 Part 1 Objective Questions (LO 6-1, 6-2)


OQ1. Click the top supplier group in the first visualization to filter the data. What is
the total amount of purchases from that group?
OQ2. Click the top supplier group in the first visualization to filter the data. How
many purchase orders were made to that group?
OQ3. Click out of the top supplier to reset the filter then click the day with the ­fewest
transactions on the line chart. How many transactions take place on that day?
OQ4. Move your cursor over the data point on the day with the fewest transactions on
the line chart. What is (are) the supplier name(s)? Clear your selection when
you are finished.

Lab 6-1 Part 1 Analysis Questions (LO 6-1, 6-2)


AQ1. As an auditor would you be more likely to find risk in supplier groups with large
total amounts and large number of orders or large total amounts and small num-
ber of orders? Why?
AQ2. Look at the line chart. What patterns do you notice in the transactions? Why
would auditors be interested in the dips, or days with low transactions?
AQ3. Why is it useful to link filters across multiple visualizations on a dashboard?
AQ4. How does showing a table of values in addition to charts and graphs help an
auditor interpret the data?

Lab 6-1 Part 2 Cluster Analysis of Borrowers


In this analysis, we want to identify the individuals who are making the biggest purchases
by amount and the largest number of purchases by count. A cluster analysis allows us to
identify natural groupings of individual cardholders and focus on their transactions.

Microsoft | Power BI Desktop

1. Open the Lab 6-1 OK PCard Audit Dashboard.pbix file from Part 1 if it’s not
already.
2. On your dashboard, resize the top two visualizations so you have space for
a new third visual on the top-right corner of your page (click each card and
drag the edges or bottom-right corner to resize).
a. Add a scatter chart to your report to show the total purchases and
orders by individual:
1. Click the Scatter Chart visualization to add it to your report.
2. In the Fields pane, drag the following values to the ­appropriate
fields, then click the drop-down menu next to each to set the
­summary measure (e.g., Sum, Average, Count):
a. X-axis: Purchase_Order_ID > Count
b. Y-axis: Purchase_Order_Amount_Local > Sum
c. Values: Entered_By

309

ISTUDY ric44907_ch06_282-333.indd 309 02/20/23 01:20 PM


b. Now show clusters in your data:
1. Click the three dots in the top-right corner of your scatter chart
visualization and choose Automatically Find Clusters.
2. Enter the following parameters and click OK.
a. Number of clusters: 6
c. Click on any of the dots to filter the data on the report.
d. Finally, use the Format (paint roller) icon to give each visual a friendly
Title, X-axis title, Y-axis title, and Legend Name.
e. Take a screenshot (label it 6-1MC) of your report.
3. When you have finished answering the questions, close Power BI Desktop
and save your workbook as Lab 6-1 OK PCard Audit Dashboard.pbix.

Tableau | Desktop

1. Open the Lab 6-1 OK PCard Audit Dashboard.twb file from Part 1 if it’s not
already.
2. Create a new worksheet with a scatter plot to show the groupings of interest
rate by average loan amount and average debt-to-income ratio:
a. Rename the sheet Purchase Clusters.
b. Columns: Purchase_Order_ID > Measure > Count
c. Rows: Purchase Order Amount Local > Measure > Sum
d. Marks > Detail: Entered By
3. Finally, add clusters to your model:
a. Click the Analytics tab on the left side of the screen.
b. Drag Cluster onto your scatter plot.
c. Set the number of clusters to 6 and close the pane.
4. Return to your dashboard and add your new clusters page to the top-right
corner and resize the other two visualizations so there are now three across
the top. Tip: Change the dashboard size from Fixed Size to Automatic if it
feels cramped on your screen.
5. Click the Purchase Clusters visual on your dashboard and click the funnel
icon to Use as Filter.
6. Click on one of the individuals to filter the data on the remaining visuals.
7. Take a screenshot (label it 6-1TC) of your chart and data.
8. Answer the lab questions, and then close Tableau. Save your worksheet as 6-1
OK PCard Audit Dashboard.twb.

310

ISTUDY
Lab 6-1 Part 2 Objective Questions (LO 6-1, 6-3)
OQ1. In the clusters chart, click the individual (entered by) with the highest purchase
amount. What is the individual’s name?
OQ2. In the clusters chart, click the individual with the highest purchase amount.
What category or supplier group received the most purchases?
OQ3. In the clusters chart, click the individual with the highest number of
­transactions. What is the individual’s name?
OQ4. In the clusters chart, click the individual with the highest number of trans-
actions. What is the category or supplier group with the highest number of
transactions?

Lab 6-1 Part 2 Analysis Questions (LO 6-1, 6-3)


AQ1. As an auditor, why would transactions by the individuals with highest purchase
amounts and highest number of transactions represent potential risk?
AQ2. How does cluster analysis help you easily identify those outliers?

Lab 6-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 6-2  iagnostic Analytics Using Benford’s


D
Law—Oklahoma
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As an internal auditor, you are tasked with evaluating audit objectives
and identifying patterns of behavior. A popular tool for understanding patterns in data is
called Benford’s law or the distribution of first digits. Not only can this be used to identify
data that don’t match the expected behavior, it also shows who might be making these types
of transactions where you as an auditor can focus your attention.
Data: Lab 6-2 OK PCard ADS.zip - 19MB Zip / 20MB Excel

Lab 6-2 Example Output


By the end of this lab, you will create a dashboard that will let you explore purchasing data
as an auditor. While your results will include different data values, your work should look
similar to this:

311

ISTUDY
Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 6-2M Example PCard Benford Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 6-2T Example PCard Benford Dashboard in Tableau Desktop

312

ISTUDY
Lab 6-2 Applying Benford’s Law to Purchase
Transactions and Individuals
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-2 [Your name] [Your email address].docx.
Start by pulling out the first digits and calculating the frequency distribution. Benford’s
law has an expected distribution of values as a percentage of the total. For example, values
starting with a 1 are estimated to occur about 30 percent of the time. We’ll compare the
frequency distribution of actual transactions to the expected distribution in this lab.
Taking Benford’s law a step further, we can calculate the average expected value by indi-
vidual to determine whether they are likely to have a lot of transactions that start with 7, 8,
and 9, or whether they meet the standard distribution. The overall average of the expected
Benford’s law values is 11.11 percent, assuming an even distribution of transactions start-
ing with 1 through 9. If an individual’s transactions average is higher than 11 percent, they
will tend to have more 1, 2, and 3 leading digits. If the individual average is lower than 11
percent, they will tend to have more 7, 8, and 9 leading digits. In this dashboard we’ll look
specifically at those individuals with low averages.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and load your data:


a. Click Home > Connect to Data > Excel.
b. Locate the Lab 6-2 OK PCard ADS.xlsx file on your computer and click
Open.
c. Choose Lab_6-2_OK_PCard_ADS and click Load.
2. Before you begin, you must add some new measures using Power Query:
a. Click Home > Transform Data.
b. Click Add Column > Custom Column:
1. New column name: Leading_Digit
2. Custom column formula: = Number.FromText(Text.Start(Text.
From(Number.Abs([Purchase_Order_Amount_Local])*100),1))
3. Note: This formula first finds the absolute value to remove any
­minus signs using the Number.Abs function and multiplies it by
100 to account for any transactions less than zero. It then converts
the amount from number to text using the Text.From function so
it can grab the first digit using the Text.Start function. Finally,
it converts it back to a number using the Number.FromText
­function.
4. Click OK.
c. Click Add Column > Custom Column:
1. New column name: Benford_Expected
2. Custom column formula: = Number.Log10(1+(1/[Leading_Digit]))
Note: This formula calculates the Benford Expected value using the
Number.Log10 function.
3. Click OK.

313

ISTUDY
Rev. Confirming Pages

d. Set the correct data types for your new columns (Transform > Data Type):
1. Leading_Digit > Whole Number
2. Benford_Expected > Percentage
3. Purchase_Order_Amount_Local > Decimal
e. Click Home > Close & Apply to return to Power BI.
3. Go to Page 1 (rename it Dashboard) and create the following visualizations:
a. Add a slicer to act as a filter and move it to the right side of the screen:
1. Field: Business_Unit_Description
a. Click the menu in the top corner of the slicer and choose ­Dropdown.
b. Choose DEPARTMENT OF TRANSPORTATION from the Busi-
ness_Unit_Description list.
b. Add a table with the Purchase Details showing the transaction details and
move it to the bottom of your page so it fills the full width of the page:
1. Columns:
a. Purchase_Order_ID
b. Purchase_Order_Date > Purchase_Order_Date
c. Entered_By
d. Supplier_Account_Name
e. Supplier_Group
f. Purchase_Order_Amount_Local
c. Add a line and clustered column chart for your Benford’s law analysis
and have it fill the remaining space at the top of the page:
1. X-axis: Leading_Digit
2. Click the three dots on the card, then Sort by > Leading Digit, then
Sort Ascending.
3. Column Y-axis: Purchase_Order_Amount_Local > Count > Show
value as > Percent of Grand Total
4. Line Y-axis: Benford_Expected > Minimum
5. Click the Format visual (paintbrush) icon:
a. Visual > X-axis: Type > Categorical
b. Visual > Lines > Shapes Stepped > On
d. Take a screenshot (label it 6-2MA).
e. Resize the Benford’s analysis chart so it fills the top-left corner of the page.
f. Click in the blank space and add a new Stacked Bar Chart in the top-
right corner:
1. Y-axis: Entered_By
2. X-axis: Benford_Expected > Average
3. Tooltips: Purchase_Order_ID > Count
4. Sort > Ascending
g. Take a screenshot (label it 6-2MB) of your dashboard.
h. For each visual, click the Format visual (paintbrush) icon and add a
friendly title.
4. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 6-2 OK PCard Benford.pbix.
314

ISTUDY ric44907_ch06_282-333.indd 314 02/20/23 01:23 PM


Tableau | Desktop

1. Create a new workbook in Tableau Desktop and load your data:


a. Click Connect to Data > Microsoft Excel.
b. Locate the Lab 6-2 OK PCard ADS.xlsx file on your computer and click
Open. Since there is only one table, it will automatically be loaded into
the Tableau data model. If not, drag OK PCard ADS to the workspace.
c. Click Sheet 1.
2. Before you begin, you must add some new calculated fields:
a. Click Analysis > Create Calculated Field. . .
1. Name: Leading Digit
2. Formula: INT(LEFT(STR(ABS([Purchase Order Amount
­Local])*100),1))
3. Note: This formula first finds the absolute value to remove any
­minus signs using the Abs function and multiplies it by 100 to
account for any transactions less than zero. It then converts the
amount from number to text using the STR function so it can grab
the first digit using the LEFT function. Finally, it converts it back to
a number using the INT function.
4. Click OK.
b. Click Analysis > Create Calculated Field. . .
1. Name: Benford Expected
2. Formula: LOG(1+(1/[Leading Digit]))
3. Note: This formula calculates the Benford Expected value using the
Log function.
4. Click OK.
3. Go to Sheet 1 (rename it Benford) and create the following visualizations:
a. Columns: Leading Digit > Dimension > Discrete
b. Rows: Purchase_Order_Amount_Local > Measure > Count, then >
Quick Table Calculation > Percent of Total
c. Marks > Detail: Benford Expected > Measure > Minimum
d. Filter: Business Unit Description
1. Click Use all and click OK.
2. Right-click the Business Unit Description filter pill and choose Show
filter.
3. Click the Business Unit Description filter card menu and choose
Single value (dropdown).
4. Click the Business Unit Description filter card menu and choose
­Apply to Worksheets > All Using this Data Source.
5. Choose DEPARTMENT OF TRANSPORTATION from the Business
Unit Description filter list.
e. Add a reference line with the Benford Expected value:
1. Go to the Analytics tab and drag Reference Line to the visual and
choose Cell.

315

ISTUDY
a. Value: MIN(Benford Expected)
b. Click OK.
f. Take a screenshot (label it 6-2TA).
4. Click Worksheet > New worksheet to add a second visual showing the aver-
age Benford Expected by individual. Name the sheet Benford Individual:
a. Columns: Benford Expected > Measure > Average
b. Rows: Entered By > Sort Ascending
c. Marks > Text: Purchase Order ID > Measure > Count
5. Create a final worksheet showing the transaction details. Name the sheet
Purchase Details:
a. Rows:
1. Purchase_Order_ID
2. Purchase_Order_Date > Attribute
3. Entered_By
4. Supplier_Account_Name
5. Supplier_Group
b. Marks > Text: Purchase_Order_Amount_Local
c. Click the Analytics tab, then drag Totals to your table as Column Grand
Totals.
6. Create a new Dashboard tab called Benfords Analysis.
a. Drag each of the three visualizations you created above into your
­dashboard from the pane on the left. Place Benford in the top-left
­corner, Benford Average in the top-right corner, and Purchase Details
along the entire bottom.
b. Click the Benford visual and click the Filter icon to set Use as filter.
c. Click the Benford Average visual and click the Filter icon to set Use as
filter.
7. Take a screenshot (label it 6-2TB) of your dashboard.
8. When you are finished answering the lab questions, close your workbook
and save your file as Lab 6-2 OK PCard Audit Dashboard.twb.

Lab 6-2 Objective Questions (LO 6-1, 6-3)


OQ1. Look at the Benford visual in the top-left corner of your dashboard. Which digit
appears much greater than its expected value?
OQ2. Click the bar for the leading digit 4 to filter the transactions. What is the grand
total of all transactions that begin with a 4 (rounded to the nearest dollar)?
OQ3. Hover over the individuals in the Benford Average visualization (you could also
create a new visualization to show the Benford average and Purchase Order
count by user). What is the name of the individual with the smallest Benford
Average who also has more than 10 transactions?
OQ4. Hover over the individuals in the Benford Average visualization. What is the
average Benford Expected value for the person (round to two decimal places)
with the smallest Benford Average who also has more than 10 transactions?

316

ISTUDY
OQ5. Hover over the individuals in the Benford Average visualization. Click the bar
for the first individual with more than 10 transactions to filter the dashboard.
What digit do most of that individual’s transactions begin with?

Lab 6-2 Analysis Questions (LO 6-1, 6-3)


AQ1. What does calculating the Benford Average tell us about individuals?
AQ2. What can we learn by looking at an individual’s transactions through the lens of
Benford’s law?
AQ3. As an auditor, why would we be concerned with a high volume of transactions
that exceed the Benford Expected value?

Lab 6-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 6-3 Finding Duplicate Payments—Sláinte


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As an internal auditor, you will find that some analyses are simple
checks for internal controls. For example, a supplier might submit multiple copies of the
same invoice to remind your company to make a payment. If the cash disbursement clerk is
not checking to see if the invoice has previously been paid, this would result in a duplicate
payment. Identifying duplicate transactions can help reveal these weaknesses and allow the
company to recover extra payments from its suppliers. That is the task you have now with
Sláinte.
Data: Lab 6-3 Slainte Payments.zip - 42KB Zip / 45KB Excel

Lab 6-3 Example Output


By the end of this lab, you will create a table that will let you explore duplicate purchase
data as an auditor. While your results will include different data values, your work should
look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 6-3M Example Duplicate Table in Microsoft Power BI Desktop

317

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 6-3T Example Duplicate Table in Tableau Desktop

Lab 6-3 Identify Matching Fields for Duplicates


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-3 [Your name] [Your email address].docx.
Companies occasionally make duplicate payments to suppliers due to lack of internal
controls, errors, or fraud. In this lab you will analyze payment transactions to collect evi-
dence about whether duplicate payments have been made to suppliers. Because the payment
amount by itself may appear more than once, such as in the case of recurring transactions,
matches should include additional fields, such as the date and invoice number. This helps
filter out false positive results, or duplicate values that are legitimate transactions.

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and load your data:


a. Click Home > Get Data > Excel.
b. Locate the Lab 6-3 Slainte Payments.xlsx file on your computer and
click Open.
c. Check the Slainte_Payment sheet, then click Transform.

318

ISTUDY
Rev. Confirming Pages

d. In Power Query, change the data types of the following fields to Text
(click the column, then click Transform > Data Type > Text; if prompt-
ed, click Replace Current):
1. Payment ID
2. Payment Account
3. Invoice Reference
4. Prepared By
5. Approved By
e. While you’re still in Power Query, add two new columns to determine
whether a transaction is a duplicate or not:
1. Click Add Column > Custom Column to combine the invoice number
and payment amount into a single value:
a. New column name: Invoice_Payment_Combo
b. Custom column formula: = [Invoice_Reference]&Text.
From([Payment_Amount])
c. Note: This formula converts the payment amount to text ­using
the Text.From function, then combines it with the Invoice
­Reference to create a single value.
d. Click OK.
2. Click Transform > Group By to identify the count of duplicates:
a. Click Advanced.
b. Field grouping: Invoice_Payment_Combo
c. New column name: Duplicate_Count
d. Operation: Count Rows
e. Click Add aggregation.
f. New column name: Detail
g. Operation: All Rows
h. Click OK.
3. Finally, expand the Detail table. Click the expand (double arrow)
icon in the Detail column header.
4. Uncheck Use original column name as prefix and click OK.
f. Click Home > Close & Apply to return to Power BI.
2. Go to Page 1 (rename it Duplicates) and create a new table:
a. Columns:
1. Payment ID
2. Payment Date > Payment_Date
3. Prepared By
4. Invoice Reference
5. Payment Amount
6. Duplicate_Count
3. Take a screenshot (label it 6-3MA).

319

ISTUDY ric44907_ch06_282-333.indd 319 02/11/23 12:52 PM


4. Now add a slicer to filter your values to only show the duplicates:
a. Click on the blank part of the page.
b. Click Slicer.
c. Field: Duplicate_Count > Select the type of slicer drop down menu > List
d. Check all values EXCEPT 1 in the slicer card.
5. Take a screenshot (label it 6-3MB).
6. Answer the lab questions and then close Power BI. Save your file as Lab 6-3
Slainte Duplicates.pbix.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop and load your data:


a. Click Connect to Data > Microsoft Excel.
b. Locate the Lab 6-3 Slainte Payments.xlsx file on your computer and
click Open.
c. Click Sheet 1 to go to your worksheet and rename it Duplicates.
d. Change the data types of the following fields to String (click the # next
to each field to change data type) and right-click each field and choose
Convert to Dimension or drag it to the dimensions list above.
1. Payment ID
2. Approved By
3. Invoice Reference
4. Payment Account
5. Prepared By
2. Before you begin, you must add a new calculated field to detect your dupli-
cate payment:
a. Click Analysis > Create Calculated Field. . .
1. Name: Duplicate?
2. Formula: IF {FIXED STR([Payment Amount])+[Invoice Reference]:
count([Payment ID])}>1 THEN ‘Yes’ ELSE ‘No’ END
3. Note: This formula first converts the values from numbers to text using
the STR function, then combines the payment amount and invoice into
one value. It will count the unique entries and if the count is greater
than 1, you have a duplicate payment and the value returned will be
Yes. For entries with only one count, the value returned will be No.
4. Click OK.
3. Create the following table. Note: If the values don’t appear as a table, check
to make sure that the rows fields all have been converted to the string and
dimension data types (see Step 1d).
a. Rows:
1. Payment ID
2. Payment Date > Attribute

320

ISTUDY
3. Prepared By
4. Invoice Reference
5. Duplicate?
b. Marks > Text: Payment Amount
4. Take a screenshot (label it 6-3TA).
5. Now filter your values to only show the duplicates:
a. Right-click the Duplicate? field and choose Show Filter.
b. Uncheck No in the filter pane.
6. Take a screenshot (label it 6-3TB).
7. Answer the lab questions and then close Tableau. Save your file as Lab 6-3
Slainte Duplicates.twb.

Lab 6-3 Objective Questions (LO 6-1, 6-2)


OQ1. How many duplicate records did you locate?
OQ2. What is (are) the invoice number(s) of the duplicate payments?
OQ3. How much money could the company recover from the supplier from these
duplicate transactions?
OQ4. What data items do you need to combine to be able to find duplicate payments?

Lab 6-3 Analysis Questions (LO 6-1, 6-2)


AQ1. Before computerization or Data Analytics, how would companies find that they
had made duplicate payments?
AQ2. What data items do you need to be able to find duplicate payments?
AQ3. Would the date of the duplicate payments usually be the same or different?

Lab 6-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 6-4 Comprehensive Case: Sampling—Dillard’s


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab you will learn how to create random samples of your data. As
you learned in the chapter, sometimes it is necessary to sample data when the population
data are very large (as they are with our Dillard’s database!). The way previous labs have
sampled the Dillard’s data is with just a few days out of a month, but that isn’t as representa-
tive of a sample as it could be. In the Microsoft track, you will learn how to create a SQL
query that will pull a random set of records from your dataset. In the Tableau track, you
will learn how to use Tableau Prep to randomly sample data. While you could also use the
SQL query in Tableau Prep, it is worth seeing how Tableau Prep provides an alternative to
writing the code for random samples.

321

ISTUDY
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 6-4 Example Output


By the end of this lab, you will create a sample of transactions from sales data. While your
results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Excel
LAB 6-4M Example of a Random Sample in Microsoft Power BI Desktop

Tableau | Prep + Desktop

Tableau Software, Inc. All rights reserved.


LAB 6-4T Example of a Random Sample in Tableau Desktop

Lab 6-4 Sampling Analysis


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-4 [Your name] [Your email address].docx.

322

ISTUDY
Microsoft | Power BI Desktop

1. Create a new project in Power BI Desktop.


2. In the Home ribbon, click Get Data > SQL Server.
3. Enter the following and click OK:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Data Connectivity: Import
d. Expand Advanced Options and input the following query:
SELECT TOP 10000 *
FROM TRANSACT
ORDER BY NEWID()
1. A note about the NEWID() function: This is a function that works with
data stored in SQL Server and it adds a 32-character unique, non-
sequential alphanumeric identifier to each record. Because of the
way this is created and then ordered, the query can be quite slow to
return results (maybe over 10 minutes!).
2. An alternative to the NEWID() query that runs much faster, but does
not provide truly randomized results exists. When you are just testing
your data and do not require a fully random sample, this can be a
good option when gathering a subset of data from very large tables:
SELECT TOP 10000 *
FROM TRANSACT
TABLESAMPLE (12000 ROWS)
3. Either query method could work with much more complex queries,
as well:
• Instead of using the wildcard in the SELECT clause to select all
of the columns, you could indicate specific attributes (just be
sure to still include TOP and whatever number of rows you want
to include in your sample first).
• You can include joins to pull data from more than one table.
• You can also include aggregate clauses and filtering clauses.
• Regardless of how much complexity you add to your query, be
sure to end the query with the clause ORDER BY NEWID().
• If you want to experiment with other queries, we recommend
that you use a smaller table (like STORE) because your queries
will run much faster. Just change the sample amount to a smaller
number, such as 25.
e. If prompted to enter credentials, keep the default of “Use my current
credentials” and click Connect.
f. If prompted with an Encryption Support warning, click OK to move past it.
g. The query may take several minutes to load. Once it has loaded, click
Transform Data to open the Power Query Editor.

323

ISTUDY
Rev. Confirming Pages

4. In the Power Query Editor, you can view how your sample is distributed.
a. From the View tab in the ribbon, place a check mark next to Column
distribution and Column profile.
b. Click through the columns to see summary statistics for each column.
c. Click the TRAN_DATE column and take a screenshot (label it
6-4MA).
5. From the Home tab in the ribbon, click Close & Apply to create a report.
This step may also take several minutes to run.
6. Create a line chart showing the total sales revenue by day:
a. In the Visualizations pane, click Line chart.
b. Drag TRAN_DATE to the X-axis, then click the drop-down to change it
from the Date Hierarchy to TRAN_DATE.
c. Drag TRAN_AMT to the Y-axis.
7. Resize the visualization (or click Focus Mode) to see the line chart more
clearly.
8. Take a screenshot (label it 6-4MB).
9. Keep in mind that, because this is a random sample, no two samples will
be identical, so if you refresh your query, you will get a new batch of 10,000
records.

Tableau | Prep + Desktop

1. Open Tableau Desktop, then open Tableau Prep.


2. In Tableau Prep, click Connect to Data.
3. Choose Microsoft SQL Server in the Connect list.
4. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
5. Double-click TRANSACT to add the table to your flow.
6. In the Input Pane, click the Data Sample tab:
a. Select Fixed Number of rows and input 10000.
b. Select Random sample.
c. Take a screenshot (label it 6-4TA).
7. Add a Clean step to your flow. This may take several minutes.
a. Explore the distribution of the dataset for each attribute, in particular
TRAN_DATE.
8. Right-click the Clean Step and select Preview in Tableau Desktop.

324

ISTUDY ric44907_ch06_282-333.indd 324 02/11/23 12:53 PM


9. In Tableau Desktop, create a line chart showing the total sales revenue by day:
a. Drag TRAN_DATE to the Columns.
b. Drag TRAN_AMT to the Rows.
c. Expand the date hierarchy in columns to Day (clicking the expand but-
ton through Quarter and Month until you get to Day).
d. In the Show Me menu, adjust your visualization from the lines (discrete)
to lines (continuous).
10. Take a screenshot (label it 6-4TB).

Lab 6-4 Analysis Questions (LO 6-2)


AQ1. In previous labs working with Dillard’s data, you have looked at data from
­specific days instead of pulling a random sample. When do you think it
is ­sufficient to look at data from specific days? When would that even be
preferred?
AQ2. In previous labs working with Dillard’s data, you have looked at data from
­specific days instead of pulling a random sample. When do you think it is
­preferred to pull a random sample?
AQ3. In the lab, you assessed the distribution of each attribute and created a
­visualization using the TRAN_DATE field. What was the range of dates
your random sample pulled in?
AQ4. Which days seem to have the highest revenue across the different years?
AQ5. If you visualize the count of TRANSACTION_ID, what do you notice about the
days with the highest revenue?
AQ6. If you filter on TRAN_TYPE to remove returns, what is the impact on the
sample data?

Lab 6-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 6-5 Comprehensive Case: Outlier Detection—Dillard’s


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As you learned in the chapter text, Benford’s law states that when you
have a large set of naturally occurring numbers, the leading digit(s) is (are) more likely to be
small. Running a Benford’s law analysis on a set of numbers is a great way to quickly assess
your data for anomalies.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

325

ISTUDY
Lab 6-5 Example Output
By the end of this lab, you will create a sample of transactions from sales data. While your
results will include different data values, your work should look similar to this:

Microsoft | Excel + Power Query

Microsoft Excel
LAB 6-5M Example Benford’s Analysis in Microsoft Excel + Power Query

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 6-5T Example Benford’s Analysis in Tableau Desktop

326

ISTUDY
Lab 6-5 Part 1 Compare Actual Leading Digits to
Expected Leading Digits
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 6-5 [Your name] [Your email address].docx.
In Part 1 of this lab you will connect to the Dillard’s data and load them into either Excel
or Tableau Desktop. Then you will create a calculated field to isolate the leading digit from
every transaction. Finally, you will construct a chart to compare the leading digit distribu-
tion to that of the expected Benford’s law distribution.

Microsoft | Excel + Power Query

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
d. Click OK, then Load to load the data directly into Excel. This may take
a few minutes because the dataset you are loading is large.
3. To isolate the leading digit, we need to add a column to the query data that
you have loaded into Excel.
a. In the first free column (Column P), type a header titled Leading Digit
and click Enter.
b. Excel has text functions that help in isolating a certain number of char-
acters in a referenced cell. You will use the =LEFT() function to refer-
ence the TRAN_AMT field and return the leading digit of each transac-
tion. The LEFT function has two arguments, the first of which is the
cell reference, and the second which indicates how many characters you
want returned. Because some of the transaction amounts are less than
a dollar and Benford’s law does not take into account the magnitude of
the number (e.g., 1 versus 100), we will also multiply each transaction
amount by 100.
1. Input =LEFT([TRAN_AMT]*100,1) into cell P2. The table will
automatically copy the function down the entirety of the column.
4. Take a screenshot (label it 6-5MA).
5. Next, you will create a PivotTable to gather a count of how many ­transactions
begin with each digit.
a. Ensure that your cursor is in one of the cells in the table. From the
Insert tab in the ribbon, select PivotTable.
b. Check that the new PivotTable is referencing your query data, then
click OK.

327

ISTUDY
c. Drag and drop Leading Digit into the Rows, and then repeat that action
by dragging and dropping it into Values.
d. The values will likely default to a raw count of the number of transac-
tions associated with each leading digit. We’d rather view this as a
percentage of the grand total, so right-click one of the values in the
PivotTable, select Show Values As, and select % of Grand Total.
6. The next step is to compare these percentages to the expected values from
Benford’s law by first creating a new calculation column next to your
­PivotTable, and then creating a Column Chart to visualize the comparison
between the expected and actual values.
a. In Cell C4 (the first free cell next to the count of the leading digits in
your PivotTable), create the logarithmic formula to calculate the Ben-
ford’s law value for each leading digit: =LOG(1+1/A4).
b. Copy the formula down the column to view the Benford’s law value for
each leading digit.
c. Copy and paste the values of the PivotTable and the new Benford’s values
to a different part of the spreadsheet. This will enable you to build a visu-
alization to compare the actual with the expected (Benford’s law) values.
d. Ensure that your cursor is in one of the cells in the new copied range of
data. From the Insert tab in the ribbon, select Recommended Charts and
then click the Clustered Column option.
7. After you answer the Objective and Analysis questions, continue to Lab 6-5
Part 2.

Tableau | Desktop

1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is; click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT *
FROM TRANSACT
WHERE TRAN_DATE BETWEEN ‘20160901’ AND ‘20160905’ AND
TRAN_AMT > 0
e. Click OK.
3. Click Sheet 1 to create your calculated fields and visualize the comparison
between actual and expected values.
a. To isolate the leading digit, click Analysis > Create Calculated Field.
Tableau has text functions that help in isolating a certain number of

328

ISTUDY
characters in a referenced cell. For our purposes, we need to return
the first (or the left-most) digit of each transaction amount. You will
use a LEFT function to reference the TRAN_AMT field. The LEFT
function has two arguments, the first of which is the attribute refer-
ence, and the second which indicates how many characters you want
returned. Because some of the transaction amounts are less than a
dollar and Benford’s law does not take into account the magnitude of
the number (e.g., 1 versus 100), we will also multiply each transaction
amount by 100.
1. Title: Leading Digit
2. Calculation: left(str([TRAN_AMT]*100),1)
3. Click OK.
b. To create the Expected Benford’s Law values, click Analysis > Create
Calculated Field again to create a new field.
1. Title: Benford’s Law
2. Calculation: LOG(INT([Leading Digit])+1)-LOG(INT([Leading
Digit]))
3. Click OK.
c. To create the Column Chart:
1. Columns: Leading Digit and Benford’s Law
a. Right-click the SUM(Benford’s Law) pill and select Measure
(Sum) > Minimum.
2. Rows: Custom SQL Query (Count)
a. Right-click the CNT(Custom SQL Query) pill and select Quick
Table Calculation > Percent of Total.
3. From the Show Me tab, change the visualization to a Side by Side
Bar Chart.
4. Take a screenshot (label it 6-5TA).
5. Answer the lab questions, then continue to the next part.

Lab 6-5 Part 1 Objective Questions (LO 6-3)


OQ1. Is the distribution of actual leading digits similar to the distribution of the Ben-
ford’s law expected digits distribution?
OQ2. Which of the actual leading digits have more observations than the expected
Benford’s law digits?
OQ3. What is the percentage of total leading digits in the actual observations that
begin with 7?

Lab 6-5 Part 1 Analysis Questions (LO 6-3)


AQ1. By glancing at the bar charts, how do the two distributions (Expected and
Actual) compare in shape?
AQ2. Do any values stand out to you as being anomalies in the actual dataset, as com-
pared to the expected values?

329

ISTUDY
Lab 6-5 Part 2 Construct Fictitious Data and Assess It for
Outliers
The assumption of Benford’s law is that falsified data would not conform to Benford’s law.
To test this, we can add a randomly generated dataset that is based on the range of transac-
tion amounts in your query results.
For the Microsoft Track, you will return to your Query output of the Excel workbook
to generate random values. For the Tableau Track, you cannot currently generate random
values in Tableau Prep or Tableau Desktop, so we have provided a spreadsheet of random
numbers to work with.

Microsoft | Excel + Power Query

1. Return to your Query output in the Excel workbook (the spreadsheet will be
labeled Query1). The calculations in the next few steps may each take a few
minutes to run because you are working with a very large dataset.
2. In cell Q1, add a new column to your table by typing Random Digit and then
click Enter.
3. In cell Q2, enter =LEFT(RANDBETWEEN(1,100),1). This will generate a
random number between 1 and 100 for each row in the table and extract the
leading digit.
4. When you create a random number set in Excel, Excel will update the ran-
dom numbers every time you make a change to the spreadsheet. For this
reason, you need to replace the data in your new Random Number column
with values.
a. Select and Copy all of the values in your new Random Digit column.
b. From the Home tab, click on the bottom half of the Paste button and
select Paste Values, then click Enter. This replaces the formula in each
cell with only the resulting values.
5. Repeat steps 5 and 6 from Part 1 of this lab to compare the actual values from
your randomly generated number set to the expected Benford’s law values.
a. When you create your PivotTable, you may need to refresh the data
to see both of your new columns. In the PivotTable Analyze tab, click
Refresh, then click into the PivotTable placeholder to see your field list.
b. Even though you multiplied the random numbers by 100, some of the
randomly generated numbers were so small that they still have a lead-
ing digit of 0. When you create your logarithmic functions to view the
Benford’s law expected values, skip the row for the 0 values.
6. Take a screenshot of your range of data showing Expected and Actual val-
ues and the column chart (label it 6-5MB).

Tableau | Desktop

1. Open a new instance of Tableau Desktop and connect to the file Lab 6-5
Fictitious Data.xlsx.

330

ISTUDY
2. Repeat step 3 in Part 1 of this lab to create the Benford’s Law Expected Val-
ues and compare them to the fictitious files in the Excel workbook.
a. The count variable from Part 1 (Custom SQL Query (Count)) will be
labeled Sheet 1 (Count) in this instance instead.
3. Take a screenshot (label it 6-5TB).

Lab 6-5 Part 2 Analysis Questions (LO 6-3)


AQ1. How do the distributions of the fictitious data compare to the Benford’s law
expected values distribution?

Lab 6-5 Part 3 Test the Distributions for Statistical


Significance
We can be more scientific in our analysis than glancing at a chart by using the Chi-Squared
Test. This statistical test indicates whether the data significantly conform to their expected
values, or if there is a significant difference between the observed (actual) values and the
expected (Benford’s law) values.
The hypotheses associated with Chi-Squared Tests are the following:
• H0: Expected proportions of leading digits are equal to the actual (observed) propor-
tion of leading digits.
• H1: Expected proportions of leading digits significantly differ from actual (observed)
proportion.
In naturally occurring datasets, we expect to retain (or fail to reject) the null hypothesis.
When the null hypothesis is rejected in naturally occurring datasets, this indicates that you
should further analyze the dataset to search for fraud or errors.
The Chi-Squared Test requires a number greater than or equal to 1, so we need to trans-
form the Actual and Expected Values from percentages into whole-number probabilities.
It is not possible to run Chi-Squared Tests in Tableau, so there is only an Excel portion
for Part 3 of this lab.

Microsoft | Excel + Power Query

1. Modify your Expected/Actual table from Part 1 to include two additional


columns: Actual Probability and Expected Probability.
a. Actual Probability: In the column to the right of the Expected Values, add
in a calculation referencing the Actual Value and multiplying it by 100.
b. Expected Probability: Create a similar function as you did for actual
probability, but this time reference the Expected Value.
2. Excel has a built-in function to calculate the p-value associated with the Chi-
Squared Test. It is =CHISQ.TEST(actual_range, expected_range). Input
this function into your spreadsheet, selecting the actual probability values
(without the label if you have one) and the expected probability values (also
without the label).

331

ISTUDY
3. Take a screenshot (label it 6-5MC) of the table and the p-value.
4. Repeat steps 1 and 2 on your fictitious dataset to calculate the Chi-Squared
Test for that set of data.
5. Take a screenshot (label it 6-5MD) of the table and the p-value.

Tableau
There is no Tableau track for Part 3

Lab 6-5 Part 3 Objective Questions (LO 6-3)


OQ1. What is the p-value that you receive for the Dillard’s dataset (the result of the
Chi-Squared function)?
OQ2. Recall the rules associated with comparing your p-value to alpha. When your
p-value is less than alpha, you reject the null hypothesis. When p-values are large
(greater than alpha), you fail to reject the null hypothesis that they are the same
distribution. Using an alpha of 0.05, what is your decision based on the p-value
you receive from the actual data in the Dillard’s dataset from the Chi-Squared
Test?
OQ3. What does your decision (to reject or fail to reject the null) indicate about the
actual data in the Dillard’s dataset?
OQ4. Using an alpha of 0.05, what is your decision regarding the fictitious dataset?

Lab 6-5 Part 3 Analysis Questions (LO 6-3)


AQ1. Why do you think the results of your Chi-Squared Tests for the actual Dillard’s
data compared to the fictitious data are so different? What does this mean to
you when interpreting the results of Benford’s law analysis and the associated
Chi-Squared Tests?
AQ2. What other transactions or datasets lend themselves to running Benford’s law
analysis?

Lab 6-5 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

332

ISTUDY
ISTUDY
Chapter 7
Managerial Analytics

A Look at This Chapter


This chapter explains how to apply Data Analytics to measure performance and answer managerial accounting
­questions. By measuring past performance and comparing it to targeted goals, we are able to assess how well a
­company is working toward a goal. Also, we can determine required adjustments to how decisions are made or
how business processes are run, if any.

A Look Back
In Chapter 6, we focused on substantive testing within the audit setting. We highlighted discussion of the audit plan,
and account balances were checked. We also highlighted the use of statistical analysis to find errors or fraud in the
audit setting. In addition, we discussed the use of clustering to detect outliers and the use of Benford’s analysis.

A Look Ahead
In Chapter 8, we will focus on how to access and analyze financial statement data. Through analysis of ratios and
trends we identify how companies appear to stakeholders. We also discuss how to analyze financial performance, and
how visualizations help find insight into the data. Finally, we discuss the use of text mining to analyze the sentiment
in financial reporting data.

334

ISTUDY
For years, Kenya Red Cross had attempted to refine its strategy and align its daily activities with its overall strategic goals.
It had annual strategic planning meetings with external consultants that always resulted in the consultants presenting a
new strategy to the organization that the Red Cross didn’t have a particularly strong buy-in to, and the Red Cross never
felt confident in what was developed or what it would mean for its future. When Kenya Red Cross went through a Data
Analytics–backed Balanced Scorecard planning process for the first time, though, it immediately felt like its organization’s
mission and vision were involved in the strategic planning and that “strategy” was no longer so vague. The Balanced
Scorecard approach helped the Kenya Red Cross align its goals into measurable metrics. The organization prided itself
on being “first in and last out” but hadn’t actively measured its success in that goal, nor had the organization fully ana-
lyzed how being the first in and last out of disaster scenarios affected other goals and areas of its organization.
Using Data Analytics to refine its strategy and assign measurable performance metrics to its goals, Kenya Red
Cross felt confident that its everyday activities were linked to measurable goals that would help the organization
reach its goals and maintain a strong positive reputation and impact through its service. Exhibit 7-1 gives an illustra-
tion of the Balanced Scorecard at the Kenya Red Cross.

EXHIBIT 7-1 The Kenya Red Cross Balanced Scorecard


Source: Reprinted with permission from Balanced Scorecard Institute, a Strategy Management Group company. Copyright 2008–2017.

OBJECTIVES
After reading this chapter, you should be able to:

LO 7-1 Explain how the IMPACT model applies to management accounting problems.
LO 7-2 Explain typical descriptive and diagnostic analytics in management accounting.
LO 7-3 Evaluate the use of KPIs as part of a Balanced Scorecard.
LO 7-4 Assess the underlying quality of data used in dashboards as part of
management accounting analytics.
LO 7-5 Understand how to address and refine results to arrive at useful
information provided to management and other decision makers.

335

ISTUDY
336 Chapter 7 Managerial Analytics

LO 7-1 APPLICATION OF THE IMPACT MODEL TO


Explain how MANAGEMENT ACCOUNTING QUESTIONS
the IMPACT
In the past six chapters, you learned how to apply the IMPACT model to data analysis
model applies
to management
projects in general and, specifically, to internal and external auditing. The same accounting
accounting information generated in the financial reporting system can also be used to address manage-
problems. ment’s questions. Together with operational and performance measurement data, we can
better determine the gaps in actual company performance and targeted strategic objectives,
and condense the results into easily digestible and useful digital dashboards, providing pre-
cisely the information needed to help make operational decisions that support a company’s
strategic direction.
This chapter teaches how to apply Data Analytics to measure performance. More spe-
cifically, we measure past performance and compare it to targeted goals to assess how well
a company is working toward a goal. In addition, we can determine required adjustments to
how decisions are made or how business processes are run, if any.
Management accounting is one of the primary areas where Data Analytics helps the
decision-making process. Indeed, the role of the management accountant is to specify
management questions and then find and analyze data to address those questions. From
assigning costs to jobs, processes, and activities; to understanding cost behavior and rel-
evant costs in decisions; to carrying out forecasting and performance evaluation, manag-
ers rely on real-time data and reporting (via dashboards and other means) to evaluate
the effectiveness of their strategies. These data help with the planning, management, and
controlling of firm resources.
Similar to the use of the IMPACT model to address auditing issues, the IMPACT model
also applies to management accounting. We now explain the management accounting ques-
tions, sources of available data, potential analytics applied, and the means of communicat-
ing and tracking the results of the analysis through use of the IMPACT model.

Identify the Questions


Management accounting questions are diverse and vary depending on what problem man-
agement is facing or what performance issue the company is trying to monitor, including
the following:
• What percentage of the airline company’s departures were on time this past month?
• What was the segment margin for the West Coast and Midwest regions last quarter?
• Which products are the most profitable for the company? How much did Job #304
cost?
• Why is segment margin higher on the West Coast than in the Midwest?
• Why did our rate of production defects go down this month compared to last
month?
• What is driving the price variance and labor rate variance?
• What is the level of expected sales in the next month, quarter, and year to help plan our
production?
• Do we extend credit or not to customers based on customer characteristics (credit
score, payment history, existing debt, etc.)?
• Should the company own or lease its headquarters office building?
• Should the company make its products or outsource to other producers?
• What is the level of sales that will allow us to break even?
• How can revenues be maximized (or costs be minimized) if there is a trade war with
China?

ISTUDY
Chapter 7 Managerial Analytics 337

Data Analytics at Work

Maximizing Profits Using Data Analytics


Blogger Robert Hernandez argues that there are seven data science skills
that will change the accounting career. Two of those skills seem to be par-
ticularly relevant to management accounting:
Revenue Analytics: Arguably, the quickest way to build profits is through
smarter pricing and sales channel optimization. Knowing what data to use
and how to analyze the data to find and optimize inefficiencies in a compa-
ny’s pricing structure is an increasingly invaluable skill set that management
accountants need to develop.
Optimizing Costs and Revenues: The end game of management accoun-
tants is to analyze and find the set of decisions that are the most optimal to
achieving long-term profitability, whether the solution has to do with increas-
ing revenue or decreasing costs or both. As the domain experts on the P&L
(income statement), management accountants should be able to direct man-
agers on how to creatively solve the puzzle of achieving higher profits.

Source: “The 7 Data Science Skills That Will Change the Accounting Career,” Origin World,
March 27, 2020, https://www.originworld.com/2020/03/27/7-data-science-skills-that-will-
change-accounting-career/, (accessed January 2, 2021).

Master Data
The data to address management accounting questions include both data from the financial
reporting system as well as data from (internal) operational systems (including manufactur-
ing (production), human resource, and supply chain data).

Perform the Test Plan


Exhibit 7-2 provides examples of the potential management accounting questions address-
able using Data Analytics as well as Data Analytic techniques used by analytics type
(descriptive, diagnostic, predictive, or prescriptive analytics).
Descriptive analytics mostly summarize performance in current and prior years, includ-
ing an examination of costs from job order and/or process costing systems. Another
descriptive measure is the use of key performance indicators (KPIs) to detail current and
past performance. We provide an example of these KPIs later in the chapter.
Diagnostic analytics detect correlations and patterns of interest and compare them to
a benchmark. In management accounting, computation of price, rate, usage, quantity, and
overhead variances are examples of differences from benchmarks (such as standard or bud-
geted costs), often highlighted by the use of conditional formatting. Another way diagnostic
analytics might be employed would be to use regression analysis to estimate cost behavior
(such as fixed, variable, and mixed costs).
Predictive analytics might be employed to forecast sales and future performance to help
plan production—or to predict which customers will pay back credit granted and which ones
will not to decide who should receive credit and who should not. Perhaps a combination
of diagnostic and predictive analytics is used to find appropriate cost drivers for allocated

ISTUDY
338 Chapter 7 Managerial Analytics

EXHIBIT 7-2 Potential Managerial Accounting


Management Analytics Type Questions Addressed Data Analytics Techniques Used
Accounting Questions
and Data Analytics Descriptive—summarize activity What percentage of the airline Summary statistics (Sums, Totals,
Techniques by Analytics or master data based on certain company’s departures were on Averages, Medians, Bar Charts,
attributes to address questions of time this past month? Histograms, etc.)
Type
this type: What was the segment margin Crosstabulations of performance
What happened? What is for the West Coast and Midwest (using PivotTables)
happening? regions last quarter? Key performance indicators
Which products are the most (KPIs) tracking performance
profitable for the company? Computation of job order
How much did Job #304 cost? Costing and/or process costing
Clustering suppliers, customers,
processes, locations
Diagnostic—detect correlations and Why is segment margin higher Comparison of KPIs to
patterns of interest and compare on the West Coast than in the expectations
them to a benchmark to address Midwest? Price, rate, usage, quantity, and
questions of this type: What is driving the price variance overhead variance analysis
Why did it happen? What are the and labor rate variance? Conditional formatting
reasons for past results? Can we Why did our rate of production Regression analysis estimating cost
explain why it happened? defects go down this month behavior
compared to last month? Correlations
Predictive—identify common What is the level of expected sales Sales forecasting
attributes or patterns that may be in the next month, quarter, and –Time series
used to forecast similar activity to year to help plan our production? –Competitor and industry
address the following questions: Do we extend credit or not to performance
Will it happen in the future? What customers based on customer Macroeconomic forecasts
is the probability something will characteristics (credit score,
Regression
happen? Is it forecastable? payment history, existing debt,
etc.)? Classification of indirect costs
(What is the appropriate cost
driver to allocate overhead?)
Prescriptive—recommend action Should the company lease or own What-if analysis (marginal
based on previously observed its headquarters office building? analysis)
actions to address questions of Should the company make its Goal-seek analysis
this type: products or outsource to other Cash-flow (capital budgeting)
What should we do based on what producers? analysis
we expect will happen? How do we What is the level of sales that will Sensitivity analysis
optimize our performance based allow us to break even?
on potential constraints? How can revenues be maximized
(or costs be minimized) if there is
a trade war with China?

indirect or overhead costs that have been used in the past, and might be useful for allocating
costs in the future.
Prescriptive analytics might be employed to perform marginal what-if analysis determin-
ing whether to own or lease a building. Goal-seek analysis might be employed to determine
break-even levels. Cash-flow analysis (or capital budgeting) might be used to decide which
investments will pay off using net present value or internal rate of return calculations.

Address and Refine Results


After performing the test plan, certain findings will be investigated further and certain assump-
tions tested using potentially new data and new analysis performed. Sensitivity analysis might

ISTUDY
Chapter 7 Managerial Analytics 339

be used to test how the outputs might change when the underlying assumptions/estimates used
in the inputs vary. For example, if we assume the estimate on the cost of capital used in cash-
flow analysis changes, then we can see if the new results might affect the decision. After
addressing and refining results, we are ready to report the findings to management.

Communicate Insights and Track Outcomes


Dashboards are often useful to track KPIs, to consistently communicate the most important
metrics for the management accounting function. We provide an example of these KPIs
later in the chapter.
In the remainder of the chapter, we’ll emphasize certain aspects of the IMPACT model
and how it is applied to the management accounting area.

IDENTIFYING MANAGEMENT LO 7-2


ACCOUNTING QUESTIONS Explain typical
Here are a few examples of typical descriptive and diagnostic analytics in management descriptive
and diagnostic
accounting.
analytics in
management
Relevant Costs accounting.
Most management decisions rely on the interpretation of cost classification and which costs are
relevant or not. Aggregating the total costs of, say, the cost to produce an item versus the cost
to purchase it in a make-or-buy or outsourcing decision may be an appropriate use of descrip-
tive analytics, as would determining capacity to accept special orders or processing further.
Relevant costs relate to relevant data, similar to the scope of an audit. Managers under-
stand that companies are collecting a lot of data, and there is a push to find patterns in
the data that help identify opportunities to connect with customers and better evaluate
performance. However, not all data are relevant to the decision-making process. The more
relevant data that are available to inform the decision and include in the relevant costs, the
more confident management can be of the answer. Of course, there is always a trade-off
between the cost of collecting that information and the incremental value of the analysis. Be
careful not to include the sunk cost of data that have already been collected while consider-
ing the opportunity cost of not utilizing data to make profitable business decisions.

Key Performance Indicators and Variance Analysis


Because data are increasingly available and affordable for companies to access and
store, and because the growth in technology has created robust and affordable business
intelligence tools, data and information are becoming the key components for decision
making, replacing limited analysis and complementing management’s intuition. Specifi-
cally, various measures and metrics are defined, compiled from the data, and used for
decision making. Performance metrics are, rather simply, any number used to measure
performance at a company. The amount of inventory on hand is a metric, and that met-
ric gains meaning when compared to a baseline (e.g., how much inventory was on hand
yesterday?). A specific type of performance metric is a key performance indicator (KPI).
Just like any performance metric, a KPI should help managers keep track of performance
and strategic objectives, but the KPIs are performance metrics that stand out as the most
important—that is, “key” metrics that influence decision making and strategy. Nearly every
organization can use data to create the same performance metrics (although, of course,
with different results), but which metrics would be deemed KPIs depends upon each orga-
nization’s particular strategy.

ISTUDY
340 Chapter 7 Managerial Analytics

Lab Connection
Lab 7-3 and Lab 7-4 have you compare sales performance with benchmarks
over time.

Variance analysis allows managers to evaluate the KPIs and how far they vary from
the expected outcome. For example, managers compare actual results to budgeted results
to determine whether a variance is favorable or unfavorable, similar to that shown in
Exhibit 7-3. The ability to use these types of bullet charts to not only identify the benchmark
but also to see the relative distance from the goal helps managers identify root causes of
the variance (e.g., the price we pay for a raw material or the increased volume of sales) and
drill down to determine the good performance to replicate and the poor performance to
eliminate.

EXHIBIT 7-3
Variance Analysis
Identifies Favorable and
Unfavorable Variances

Lab Connection
Lab 7-1 has you calculate job costs and variances.

Cost Behavior
Managers must also understand what is driving the costs and profits to plan for the future
and apply to budgets or use as input for lean accounting processes. For example, they must
evaluate mixed costs to predict the portion of fixed and variable costs for a given period. Pre-
dictive analytics, such as regression analysis, might evaluate actual production volume and
total costs to estimate the mixed cost line equation, such as the one shown in Exhibit 7-4.

ISTUDY
Chapter 7 Managerial Analytics 341

EXHIBIT 7-4
Regression Analysis of
Mixed Costs

This example was calculated using a scatter plot chart over a 12-month period in Excel.
The mixed costs can be interpreted as consisting of fixed costs of approximately $181,480
per month (the intercept) and variable costs of approximately $13.30 per unit produced.
The R2 value of 0.84 tells us that this line fits the data pretty well and will predict the correct
value 84 percent of the time.
Regression and other predictive techniques help managers identify outliers, anomalies,
and poor performers so they can act accordingly. They also rely on more observations so
the prediction is much more accurate than other rudimentary accounting calculations, such
as the high-low method. These same trend analyses inform the master budget from sales to
cash and can be combined with sensitivity or what-if analyses to predict a range of values.

PROGRESS CHECK
1. If a manager is trying to decide whether to discontinue a product or division, they
would look at the contribution margin of that object. What are some examples of
relevant data that would be useful in this calculation? Irrelevant data?
2. A bullet chart (as shown in Exhibit 7-3) uses a reference line to show actual per-
formance relative to a benchmark. What advantages does a bullet graph have
over a gauge, such as a fan with red, yellow, and green zones and a needle
pointing to the current value?

BALANCED SCORECARD AND KEY LO 7-3


PERFORMANCE INDICATORS Evaluate the use
of KPIs as part
As you will recall from Chapter 4, the most effective way to communicate the results of
of a Balanced
any data analysis project is through data visualization. A project in which you are determin- Scorecard.
ing the right KPIs and communicating them to the appropriate stakeholders is no differ-
ent. One of the most common ways to communicate a variety of KPIs is through a digital
dashboard. A digital dashboard is an interactive report showing the most important met-
rics to help users understand how a company or an organization is performing. There are
many public digital dashboards available; for example, the Walton College of Business at the
University of Arkansas has an interactive dashboard to showcase enrollment, where students
are from, where students study abroad, student retention and graduation rates, and where
alumni work after graduation (https://walton.uark.edu/osie/reports/data-dashboard.php).

ISTUDY
342 Chapter 7 Managerial Analytics

The public dashboard detailing student diversity at the Walton College can be used by
prospective students to learn more about the university and by the university itself to assess
how it is doing in meeting goals. If the university has a goal of increasing gender balance
in enrollment, for example, then monitoring the “Diverse Walton” metrics, pictured in
Exhibit 7-5, can help the university understand how it is doing at reaching that goal.

EXHIBIT 7-5
Walton College Digital
Dashboard—Diverse
Walton

Digital dashboards provide interesting information, but their value is maximized when
the metrics provided on the dashboard are used to affect decision making and action. One
iteration of a digital dashboard is the Balanced Scorecard. The Balanced Scorecard was
created by Robert S. Kaplan and David P. Norton in 1996 to help companies turn their
strategic goals into action by identifying the most important metrics to measure, as well as
identifying target goals to compare metrics against.
The Balanced Scorecard is comprised of four components: financial (or stewardship),
customer (or stakeholder), internal process, and organizational capacity (or learning and
growth). As depicted in Exhibit 7-6, the measures in each category affect other categories,
and all four should be directly related to the strategic objectives of an organization.
For each of the four components, objectives, measures, targets, and initiatives are iden-
tified. Objectives should be aligned with strategic goals of the organization, measures are
the KPIs that show how well the organization is doing at meeting its objective, and tar-
gets should be achievable goals toward which to move the metric. Initiatives should be the
actions that an organization can take to move its specified metrics in the direction of their
stated target goal. Exhibit 7-7 is an example of different objectives that an organization

ISTUDY
Chapter 7 Managerial Analytics 343

EXHIBIT 7-6
Components of the
­Balanced Scorecard

EXHIBIT 7-7
An Example of a
­Balanced Scorecard
Reprinted with ­permission
from Balanced ­Scorecard
Institute, a Strategy
­Management Group
­Company. ­Copyright
2008–2017.

might identify for each component. You can see how certain objectives relate to other
objectives—for example, if the organization increases process efficiency (in the internal pro-
cess component row), that should help with the objective of lowering cost in the financial
component row.
Understanding how the four components interact to answer different types of questions
and meet different strategic goals is critical when it comes to identifying the right measures
to include in the dashboard, as well as using those measures to help with decision making.
Creating a Balanced Scorecard or any type of digital dashboard to present KPIs for decision
making follows the IMPACT model.

ISTUDY
344 Chapter 7 Managerial Analytics

Lab Connection
Lab 7-2 has you create a Balanced Scorecard dashboard to evaluate four KPIs.

Bernard Marr identified 75 KPIs to measure performance in the different components


that he considers the most important for decision makers to know, and these 75 KPIs are
compiled in Exhibit 7-8. In a Balanced Scorecard, each component should focus on 3 or 4
KPIs. Including all 75 of these metrics in a given dashboard would be overwhelming and
difficult to manage, but depending on the strategy of the company and the initiatives that
are chosen as focal points, any of the KPIs in Exhibit 7-8 may be optimal for measuring
(and ultimately improving) performance.

EXHIBIT 7-8 Financial Performance KPIs Operational KPIs


Suggested KPIs That 1. Net Profit 38. Six Sigma Level
Every Manager Needs 2. Net Profit Margin 39. Capacity Utilization Rate (CUR)
to Know1
3. Gross Profit Margin 40. Process Waste Level
Source: https://www.linkedin. 4. Operating Profit Margin 41. Order Fulfillment Cycle Time
com/pulse/20130905053105-
64875646-the-75-kpis-every- 5. EBITDA 42. Delivery in Full, on Time (DIFOT) Rate
manager-needs-to-know 6. Revenue Growth Rate 43. Inventory Shrinkage Rate (ISR)
7. Total Shareholder Return (TSR) 44. Project Schedule Variance (PSV)
8. Economic Value Added (EVA) 45. Project Cost Variance (PCV)
9. Return on Investment (ROI) 46. Earned Value (EV) Metric
10. Return on Capital Employed (ROCE) 47. Innovation Pipeline Strength (IPS)
11. Return on Assets (ROA) 48. Return on Innovation Investment (ROI2)
12. Return on Equity (ROE) 49. Time to Market
13. Debt-to-Equity (D/E) Ratio 50. First-Pass Yield (FPY)
14. Cash Conversion Cycle (CCC) 51. Rework Level
15. Working Capital Ratio 52. Quality Index
16. Operating Expense Ratio (OER) 53. Overall Equipment Effectiveness (OEE)
17. CAPEX to Sales Ratio 54. Process or Machine Downtime Level
18. Price-to-Earnings Ratio (P/E Ratio) 55. First Contact Resolution (FCR)
Customer KPIs Employee Performance KPIs
19. Net Promoter Score (NPS) 56. Human Capital Value Added (HCVA)
20. Customer Retention Rate 57. Revenue per Employee
21. Customer Satisfaction Index 58. Employee Satisfaction Index
22. Customer Profitability Score 59. Employee Engagement Level
23. Customer Lifetime Value 60. Staff Advocacy Score
24. Customer Turnover Rate 61. Employee Churn Rate
25. Customer Engagement 62. Average Employee Tenure
26. Customer Complaints 63. Absenteeism Bradford Factor
64. 360-Degree Feedback Score
65. Salary Competitiveness Ratio (SCR)
66. Time to Hire
67. Training Return on Investment
(Continued)

1
https://www.linkedin.com/pulse/20130905053105-64875646-the-75-kpis-every-manager-needs-
to-know.

ISTUDY
Chapter 7 Managerial Analytics 345

Marketing KPIs Environmental and Social Sustainability KPIs EXHIBIT 7-8


27. Market Growth Rate 68. Carbon Footprint (Continued)
28. Market Share 69. Water Footprint
29. Brand Equity 70. Energy Consumption
30. Cost per Lead 71. Saving Levels Due to Conservation and
31. Conversion Rate Improvement Efforts
32. Search Engine Rankings (by keyword) and 72. Supply Chain Miles
Click-Through Rate 73. Waste Reduction Rate
33. Page Views and Bounce Rate 74. Waste Recycling Rate
34. Customer Online Engagement Level 75. Product Recycling Rate
35. Online Share of Voice (OSOV)
36. Social Networking Footprint
37. Klout Score

The Balanced Scorecard is based around a company’s strategy. A well-defined mission,


vision, and set of values are integral in creating and maintaining a successful culture. In many
cases, when tradition appears to stifle an organization, the two concepts of culture and tradi-
tion must be separated. An established sense of purpose and a robust tradition of service can
serve as catalysts to facilitate successful organizational changes. A proper strategy for growth
considers what a firm does well and how it achieves it. With a proper strategy, an organiza-
tion is less likely to be hamstrung by a “this is how we’ve always done it” mentality.
If a strategy is already developed, or after the strategy has been fully defined, it needs
to be broken down into goals that can be measured. Identifying the pieces of the strat-
egy that can be measured is critical. Without tracking performance and measuring results,
the strategy is only symbolic. The adage “what gets measured, gets done” shows the moti-
vation behind aligning strategy statements with KPIs—people are more inclined to focus
their work and their projects on initiatives that are being paid attention to and measured.
Of course, simply measuring something doesn’t imply that anything will be done to improve
the measure—­the attainable initiative attached to a metric indicating how it can be improved
is a key piece to ensuring that people will work to improve the measure.

PROGRESS CHECK
3. To illustrate what KPIs emphasize in “what gets measured, gets done,” Walmart
has a goal of a “zero waste future.”2 How does reporting Walmart’s waste recy-
cling rate help the organization figure out if it is getting closer to its goal? Do you
believe it helps the organization accomplish its goals?
4. How can management identify useful KPIs? How could Data Analytics help
with that?

MASTER THE DATA AND PERFORM LO 7-4


THE TEST PLAN Assess the
underlying quality
Once the measures have been determined, the data that are necessary to showcase those
of data used in
measures need to be identified. You were first introduced to how to identify and obtain dashboards as part
necessary data in Chapter 2 through the ETL (extract, transform, and load) process. of management
accounting
2
http://corporate.walmart.com/2016grr/enhancing-sustainability/moving-toward-a-zero-waste-future analytics.
(accessed August 2017).

ISTUDY
346 Chapter 7 Managerial Analytics

In addition to working through the same data request process that is detailed in Chapter 2,
there are two other questions to consider when obtaining data and evaluating their quality:
1. How often do the data get updated in the system? This will help you be aware of how
up-to-date your metrics are so that you interpret the changes over time appropriately.
2. Additionally, how often do you need to see updated data? If the data in the system are
updated on a near-real-time basis, it may not be necessary for you to have new updates
pushed to your scorecard as frequently. For example, if your team will assess their
progress only in a once-a-week meeting, there is no need to have a constantly updating
scorecard.
While the data for calculating KPIs are likely stored in the company’s enterprise system
or accounting information system, the digital dashboard containing the KPIs for data analy-
sis should be created in a data visualization tool, such as Power BI or Tableau. Loading the
data into these tools should be done with precision and should be validated to ensure the
data imported were complete and accurate.
Designing data visualizations and selecting the right way to express data (as whole
numbers, percentages, absolute values, etc.) was discussed in Chapter 4. Specifically for
digital dashboards, the format of your dashboard can follow the pattern of a Balanced
Scorecard with a strategy map, or it can take on a different format. Exhibit 7-9 shows a
template for building out the objectives, measures, targets, and initiatives into a Balanced
Scorecard format.

EXHIBIT 7-9 Business Objectives and Strategy Map Measures Targets Initiatives
Balanced Scorecard Financial (List 3–4 (List the
Strategy Map ­Template KPIs to initiatives that
(In each circle, list the objectives associated
support (Use the arrows are in line with
with Measures, with each different component)
to express if the
each helping the
­Targets, and Initiatives component metric should organization
here) increase or meet the listed
decrease to meet targets)
the goal, and
then indicate by
how much)
Customer - - -

Internal - - -
Processes

Organizational - - -
Capacity

If the dashboard is not following the strategy map template, the most important KPIs
should be placed in the top-left corner, as our eyes are most naturally drawn to that part of
any page that we are reading.

Lab Connection
Lab 7-5 has you create a dashboard with advanced models to evaluate sales
performance.

ISTUDY
Chapter 7 Managerial Analytics 347

PROGRESS CHECK
5. How often would you need to see the KPI of Waste Recycling Rate to know if you
are making progress? Any different for the KPI of ROA?
6. Why does the location of individual visuals on a dashboard matter?

ADDRESS AND REFINE RESULTS LO 7-5


Once the dashboard is in use, an active communication plan should be implemented to Understand how to
ensure that the dashboard’s metrics are meeting the needs of the business and the users. If address and refine
there are multiple audiences who use dashboards, then either different dashboards should results to arrive at
be created, or the dashboard should provide different views and ways to filter the informa- useful information
tion so users can customize their experience and see exactly the metrics they need for deci- provided to
sion making and monitoring. Because dashboards tend to be monitored on a daily (or even management and
more frequent) basis, communication with all of the users is imperative to ensure that the other decision
makers.
identified metrics are appropriate and useful.
Some questions that would be helpful in determining how the dashboard could be
refined are the following:
1. Which metric are you using most frequently to help you make decisions?
2. Are you downloading the data to do any additional analysis after working with the dash-
board, and if so, can the dashboard be improved to save those extra steps?
3. Are there any metrics that you do not use? If so, why aren’t they helpful?
4. Are there any metrics that should be available on the dashboard to help you with deci-
sion making?
Checking in with the users will help to address any potential issues of missing or unnec-
essary data and refine the dashboard so that it is meeting the needs of the organization and
the users appropriately.
After the resulting dashboard has been refined and each user of the dashboard is receiv-
ing the right information for decision making, the dashboard should enter regular use across
the organization. Recall that the purpose of creating a digital dashboard is to communicate
how the organization is performing so decision makers can improve their judgment and
decisions and so workers can understand where to place their priority in their day-to-day
jobs and projects. Ensuring that all of the appropriate stakeholders continue to be involved
in using the dashboard and continually improving it is key to the success of the dashboard.
The creation of a Balanced Scorecard or any type of digital dashboard is iterative—just as
the entire IMPACT cycle should be iterative throughout any data analysis project—so it will
be imperative to continually check in with the users of the dashboard to learn how to con-
tinually improve it and its usefulness.

PROGRESS CHECK
7. Why are digital dashboards for KPIs an effective way to address and refine
results, as well as communicate insights and track outcomes?
8. Consider the opening vignette of the Kenya Red Cross. How do KPIs help the
organization prepare and carry out its goal of being the “first in and last out”?

ISTUDY
Summary
■ Management accountants must use descriptive analytics to understand and direct activ-
ity, diagnostic analytics to compare with a benchmark and control costs, predictive ana-
lytics to plan for the future, and prescriptive analytics to guide their decision process.
(LO 7-1)
■ Relevant costs and data help inform decisions, variance analysis and bullet graphs help
determine where the company is, and regression helps managers understand and predict
costs. (LO 7-2)
■ Because data are increasingly available and affordable for companies to access and
store, and because the growth in technology has created robust and affordable business
intelligence tools, data and information are becoming the key components for decision
­making, replacing gut response. (LO 7-2)
■ Performance metrics are defined, compiled from the data, and used for decision mak-
ing. A specific type of performance metrics, key performance indicators (KPIs)—or
“key” metrics that influence decision making and strategy—are the most important.
(LO 7-3)
■ One of the most common ways to communicate a variety of KPIs is through a digi-
tal dashboard. A digital dashboard is an interactive report showing the most important
­metrics to help users understand how a company or an organization is performing.
Its value is maximized when the metrics provided on the dashboard are used to affect
­decision making and action. (LO 7-3)
■ One iteration of a digital dashboard is the Balanced Scorecard, which is used to help
companies turn their strategic goals into action by identifying the most important
metrics to measure, as well as identifying target goals to compare metrics against. The
­Balanced Scorecard is comprised of four components: financial (or stewardship), cus-
tomer (or stakeholder), internal process, and organizational capacity (or learning and
growth). (LO 7-3, 7-4)
■ For each of the four components, objectives, measures, targets, and initiatives are iden-
tified. Objectives should be aligned with strategic goals of the organization, measures
are the KPIs that show how well the organization is doing at meeting its objective, and
targets should be achievable goals toward which to move the metric. Initiatives should be
the actions that an organization can take to move its specified metrics in the direction of
its stated target goal. (LO 7-3, 7-4)
■ Regardless of whether you are creating a Balanced Scorecard or another type of digital
dashboard to showcase performance metrics and KPIs, the IMPACT model should be
used to complete the project. (LO 7-3, 7-4, 7-5)

Key Words
Balanced Scorecard (342) A particular type of digital dashboard that is made up of strategic objectives,
as well as KPIs, target measures, and initiatives, to help the organization reach its target measures in line
with strategic goals.
digital dashboard (341) An interactive report showing the most important metrics to help users
­understand how a company or an organization is performing. Often created using Excel or Tableau.
key performance indicator (KPI) (339) A particular type of performance metric that an organization
deems the most important and influential on decision making.
performance metric (339) Any calculation measuring how an organization is performing, particularly
when that measure is compared to a baseline.

348

ISTUDY
ANSWERS TO PROGRESS CHECKS
1. The contribution margin includes the revenues and variable costs that are traceable to
that division or product. Those data would be relevant. Other relevant data may be the
types of customers and sentiment toward the product, products that are sold in conjunc-
tion with that product, or market size. Shared or allocated costs would not be relevant.
2. A bullet graph uses a small amount of space to evaluate a large number of metrics.
Gauges are more visually engaging and easier to understand, but waste a lot of space.
3. If waste reduction is an important goal for Walmart, having a KPI and, potentially, a digital
dashboard that reports how well the organization is doing will likely be useful in helping
accomplish its goal. Using a digital dashboard helps an organization to see if it is making
progress.
4. The KPIs that are the most helpful are those that are consistent with the company’s strategy
and measure how well the company is doing in meeting its goals. Data Analytics will help
gather and report the necessary data to report on the KPIs. The Data A ­ nalytics IMPACT
model introduced in Chapter 1—from identifying the question to tracking outcomes—­will
be helpful in getting the necessary data.
5. The frequency of updating KPIs is always a good question. One determinant will be how
often the data get updated in the system, and the second determinant is how often the
data will be considered by those looking at the data. Whichever of those two determi-
nants takes longer is probably the correct frequency for updating KPIs.
6. The most important KPIs should be placed in the top-left corner because our eyes are
most naturally drawn to that part of any page that we are reading
7. By identifying the KPIs that are most important to corporate strategy and finding the nec-
essary data to support them and then reporting on them in a digital dashboard, deci-
sion makers will have the necessary information to make effective decisions and track
outcomes.
8. As noted in the opening vignette, using Data Analytics to refine its strategy and assign
measurable performance metrics to its goals, Kenya Red Cross felt confident that its
everyday activities were linked to measurable goals that would help the organization
reach its goals and maintain a strong positive reputation and impact through its service.

Multiple Choice Questions


®

1. (LO 7-1) Which of the following would not be considered a prescriptive analytics
technique?
a. Sensitivity Analysis Evaluating Assumptions of Future Performance
b. Crosstabulation Analyzing Past Performance
c. Breakeven Level in Sales
d. Capital Budgeting
2. (LO 7-3) What would you consider to be an operational KPI?
a. Inventory Shrinkage Rate
b. Brand Equity
c. CAPEX to Sales Ratio
d. Revenue per Employee

349

ISTUDY
3. (LO 7-3) What does KPI stand for?
a. Key performance index
b. Key performance indicator
c. Key paired index
d. Key paired indicator
4. (LO 7-4) The most important KPIs should be placed in the corner of the page
even if we are not following a strategy map template.
a. bottom right
b. bottom left
c. top left
d. top right
5. (LO 7-4) According to the text, which of these are not helpful in refining a dashboard?
a. Which metric are you using most frequently to help you make decisions?
b. Are you downloading the data to do any additional analysis after working with the
dashboard, and if so, can the dashboard be improved to save those extra steps?
c. Are there any metrics that you do not use? If so, why aren’t they helpful?
d. Which data are the easiest to access or least costly to collect?
6. (LO 7-4) On a Balanced Scorecard, which is not included as a component?
a. Financial Performance
b. Customer/Stakeholder
c. Internal Process
d. Employee Capacity
7. (LO 7-4) Which of the following would be considered to be a diagnostic analytics tech-
nique in managerial accounting?
a. Summary Statistics
b. Computation of Job Order Costing
c. Price and Rate Variance Analysis
d. Sales Forecasts
8. (LO 7-3) What is defined as an interactive report showing the most important metrics to
help users understand how a company or an organization is performing?
a. KPI
b. Performance metric
c. Digital dashboard
d. Balanced Scorecard
9. (LO 7-1) What would you consider to be a prescriptive analytics technique in manage-
ment accounting?
a. Computation of KPIs
b. Capital Budgeting
c. Comparison of Actual Performance to Budgeted Performance
d. Cash Flow Forecasts
10. (LO 7-2) What would you consider to be a diagnostic analytics technique in manage-
ment accounting?
a. Computation of Rate Variances
b. Sales Forecasts based on Time Series Analysis
c. Breakeven Level in Sales
d. Computation of Product Sales in Prior Period

350

ISTUDY
Discussion and Analysis
®

1. (LO 7-1) In the article “The 7 Data Science Skills That Will Change the Accounting
Career,” Robert Hernandez suggests that two critical skills are (1) revenue analytics and
(2) optimizing costs and revenues. Would these skills represent descriptive, diagnostic,
predictive, or prescriptive analytics? Why?
2. (LO 7-3, 7-4) We know that a Balanced Scorecard is comprised of four components:
financial (or stewardship), customer (or stakeholder), internal process, and organiza-
tional capacity (or learning and growth). What would you include in a dashboard for the
financial and customer components?
3. (LO 7-3, 7-4) We know that a Balanced Scorecard is comprised of four components:
financial (or stewardship), customer (or stakeholder), internal process, and organiza-
tional capacity (or learning and growth). What would you include in a dashboard for the
internal process and organizational capacity components? How do digital dashboards
make KPIs easier to track?
4. (LO 7-3) Amazon, in our opinion, has cared less about profitability in the short run but has
cared about gaining market share. Arguably, Amazon gains market share by taking care
of the customer. Given the 75 KPIs that every manager needs to know in Exhibit 7-8, what
would be a natural KPI for the customer aspect for Amazon?
5. (LO 7-3) For an accounting firm like PwC, how would the Balanced Scorecard help bal-
ance the desire to be profitable for its partners with keeping the focus on its customers?
6. (LO 7-3) For a company like Walmart, how would the Balanced Scorecard help balance
the desire to be profitable for its shareholders with continuing to develop organizational
capacity to compete with Amazon (and other online retailers)?
7. (LO 7-3) Why is Customer Retention Rate a great KPI for understanding Tesla’s
customers?
8. (LO 7-5) Assuming you have access to data that are updated in real time, are there situ-
ations when you would not want to update your digital dashboard in real time? Why or
why not?
9. (LO 7-2) In which of the four components of a Balanced Scorecard would you put the
Walton College’s diversity initiative? Why do you think this is important for a public insti-
tution of higher learning?

Problems
®

1. (LO 7-1, 7-2) Match the description of the management accounting question to the data
analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
Managerial Accounting Question Data Analytics Type
1. How much did Job #318 cost per unit?
2. What is driving the rate variance on manufacturing?
3. How can revenues be maximized (or costs be minimized) if there is
an increase in VAT tax in Ireland?
4. What is the appropriate cost driver?
5. What is the forecasted cash balance at the end of September?
6. What is the level of fixed and variable costs for the manufacture of
Product 317?

351

ISTUDY
2. (LO 7-1, 7-2) Match the description of the management accounting technique to the
data analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Managerial Accounting Technique Data Analytics Type


1. Conditional formatting
2. Time series analysis
3. What-if analysis
4. Sensitivity analysis
5. Summary statistics
6. Capital budgeting
7. Computation of job order costing

3. (LO 7-3) Match the following KPIs to one of the following KPI types:
• Financial Performance
• Operational
• Customer
• Employee Performance
• Marketing
• Environmental and Social Sustainability

KPI KPI Type


1. Brand Equity
2. Project Cost Variance
3. Waste Reduction Rate
4. EBITDA
5. 360-Degree Feedback
6. Net Promoter Score
7. Time to Market

4. (LO 7-3) Match the following KPIs to one of the following KPI types:
• Financial Performance
• Operational
• Customer
• Employee Performance
• Marketing
• Environmental and Social Sustainability

KPI KPI Type


1. Debt-to-Equity Ratio
2. Water Footprint
3. Customer Satisfaction
4. Overall Equipment Effectiveness
(Continued)

352

ISTUDY
(Continued)
KPI KPI Type
5. Return on Capital Employed
6. Quality Index
7. Cost per Lead
8. Energy Consumption

5. (LO 7-3) Of the list of KPIs shown below, indicate which would be considered to be
financial performance KPIs, and which would not be.

KPI Financial Performance KPI?


1. Market Share
2. Net Income (Net Profit)
3. Cash Conversion Cycle (CCC)
4. Conversion Rate
5. Gross Profit Margin
6. Net Profit Margin
7. Customer Satisfaction Index
8. Working Capital Ratio

6. (LO 7-3) Analysis: From Exhibit 7-8, choose five financial performance KPIs to answer the
following three questions. This URL (https://www.linkedin.com/pulse/20130905053105-
64875646-the-75-kpis-every-manager-needs-to-know) provides background informa-
tion for each individual KPI that may be helpful in understanding the individual KPIs and
answering the questions.
6A. Identify the equation/relationship/data needed to calculate the KPI. If you need
data, how frequently would the data need to be incorporated to be most useful?
6B. Describe a simple visualization that would help a manager track the KPI.
6C. Identify a benchmark for the KPI from the Internet. Choose an industry and find the
average, if possible. This is for context only.
7. (LO 7-3) Of the list of KPIs shown below, indicate which would be considered to be
employee performance KPIs, and which would not be.

KPI Employee Performance KPI?


1. Return on Assets
2. Time to Market
3. Revenue per Employee
4. Human Capital Value Added (HCVA)
5. Employee Satisfaction Index
6. Staff Advocacy Score
7. Klout Score
8. Employee Engagement Level

8. (LO 7-3) Analysis: From Exhibit 7-8, choose 10 employee performance KPIs
to answer the following three questions. This URL (https://www.linkedin.com/
pulse/20130905053105-64875646-the-75-kpis-every-manager-needs-to-know)

353

ISTUDY
provides background information for each individual KPI that may be helpful in under-
standing the individual KPIs and answering the questions.
8A. Identify the equation/relationship/data needed to calculate the KPI. How frequently
would it need to be incorporated to be most useful?
8B. Describe a simple visualization that would help a manager track the KPI.
8C. Identify a benchmark for the KPI from the Internet. Choose an industry and find the
average, if possible. This is for context only.
9. (LO 7-3) Of the list of KPIs shown below, indicate which would be considered to be
marketing performance KPIs, and which would not be.

KPI Marketing Performance KPI?


1. Conversion Rate
2. Cost per Lead
3. Page Views and Bounce Rate
4. Process Waste Level
5. Employee Satisfaction Index
6. Brand Equity
7. Customer Online Engagement Level
8. Order Fulfilment Cycle Time

10. (LO 7-3) Analysis: From Exhibit 7-8, choose 10 marketing KPIs to answer the follow-
ing three questions. This URL (https://www.linkedin.com/pulse/20130905053105-
64875646-the-75-kpis-every-manager-needs-to-know) provides background information
for each individual KPI that may be helpful in understanding the individual KPIs and
answering the questions.
10A. Identify the equation/relationship/data needed to calculate the KPI. How fre-
quently would it need to be incorporated to be most useful?
10B. Describe a simple visualization that would help a manager track the KPI.
10C. Identify a benchmark for the KPI from the Internet. Choose an industry and find
the average, if possible. This is for context only.
11. (LO 7-4) Analysis: How does Data Analytics help facilitate the use of the Balanced
Scorecard and tracking KPIs? Does it make the data more timely? Are you able to
access more information easier or faster, or what capabilities does it give?
12. (LO 7-3) Analysis: If ROA is considered a key KPI for a company, what would be an
appropriate benchmark? The industry’s ROA? The average ROA for the company for the
past five years? The competitors’ ROA?
12A. How will you know if the company is making progress?
12B. How might Data Analytics help with this?
12C. How often would you need a measure of ROA? Monthly? Quarterly? Annually?
13. (LO 7-3) Analysis: If Time to Market is considered a key KPI for a company, what would
be an appropriate benchmark? The industry’s time to market? The average time to mar-
ket for the company for the past five years? The competitors’ time to market?
13A. How will you know if the company is making progress?
13B. How might Data Analytics help with this?
13C. How often would you need a measure of Time to Market? Monthly? Quarterly?
Annually?
14. (LO 7-3) Analysis: Why is Order Fulfillment Cycle Time an appropriate KPI for a com-
pany like Wayfair (which sells furniture online)? How long does Wayfair think customers
will be willing to wait if Amazon Prime promises items delivered to its customers in two
business days? Might this be an important basis for competition?

354

ISTUDY
Rev. Confirming Pages

Lab 7-1 Evaluate Job Costs—Sláinte


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Sláinte has branched out in recent years to offer custom-branded
microbrews for local bars that want to offer a distinctive tap and corporate events where
a custom brew can be used to celebrate a big milestone. Each of these custom orders fall
into one of two categories: Standard jobs include the brew and basic package design,
and Premium jobs include customizable brew and enhanced package design with custom
designs and upgraded foil labels. Sláinte’s management has begun tracking the costs asso-
ciated with each of these custom orders with job cost sheets that include materials, labor,
and overhead allocation. You have been tasked with helping evaluate the costs for these
custom jobs.
Data: Lab 7-1 Slainte Job Costs.zip - 26KB Zip / 30KB Excel

Lab 7-1 Example Output


By the end of this lab, you will create a dashboard that will let you explore job cost variance.
While your results will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 7-1M Example Job Cost Dashboard in Microsoft Power BI Desktop

355

ISTUDY ric44907_ch07_334-403.indd 355 06/22/22 11:06 AM


Rev. Confirming Pages

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 7-1T Example Job Cost Dashboard in Tableau Desktop

Lab 7-1 Part 1 Calculate Revenues and Costs


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-1 [Your name] [Your email address].docx.
All of the data revolve around the job cost records, but you are only given the compo-
nents, such as quantity and amount. Before you build your dashboard, you will need to
calculate the aggregate values for category costs, including the budgeted and actual direct
materials, direct labor, overhead costs, and profit.

Microsoft | Power BI Desktop

1. Open Power BI Desktop and connect to your data:


a. Click Home > Get Data > Excel.
b. Browse to find the Lab 7-1 Slainte Job Costs.xlsx file and click Open.
c. Check all of the tables and click Load.
d. Click Modeling > Manage relationships to verify that the tables loaded
correctly. For example, if you see an issue with the relationship between

356

ISTUDY ric44907_ch07_334-403.indd 356 06/22/22 11:06 AM


Rev. Confirming Pages

Job_Orders and Job_Rates, click the relationship, then click Edit and
change the Cross filter direction to Both.
2. Rename Sheet 1 to Job Composition.
3. Next, create calculated fields for the following category subtotals that you will
include in your graphs and tables. Click the Job_Orders table in the fields
list to make that the home of your new measures, then click ­Modeling > New
Measure. Enter each of the formulas below as a new measure. Note: If the
measure shows up in an unintended table, click the measure and change the
Home Table in the ribbon to Job_Orders.
a. Actual Revenue = SUM(Job_Orders[Job_Revenue])
b. Actual DM Cost = SUM(Material_Requisition[Material_Cost])
c. Actual DL Cost = SUM(Time_Record[Hours])*SUM(Job_Rates[Direct_
Labor_Rate])
d. Actual OH Cost = SUM(Time_Record[Hours])*MIN(Job_
Rates[Overhead_Rate])
e. Actual Profit = [Actual Revenue]-[Actual DM Cost]-[Actual DL Cost]-
[Actual OH Cost]
f. Actual Profit Margin = [Actual Profit]/[Actual Revenue]
4. To enable benchmarks for comparison, add the following measures for the
budgeted amounts:
a. Budgeted DM Cost = SUM(Job_Orders[Job_Budgeted_DM_Cost])
b. Budgeted DL Cost = SUM(Job_Rates[Direct_Labor_Rate])*SUM(Job_
Orders[Job_Budgeted_Hours])
c. Budgeted OH Cost = SUM(Job_Rates[Overhead_Rate])*SUM(Job_
Orders[Job_Budgeted_Hours])
d. Budgeted Profit = [Actual Revenue]-[Budgeted DM Cost]-[Budgeted DL
Cost]-[Budgeted OH Cost]
5. To enable the use of color on your graphs to show favorable and unfavorable
analyses, create some additional measures based on IF . . . THEN . . . logic.
To make unfavorable variances appear in orange use the color hex value
#F28E2B in the THEN part of the formula. Power BI will apply conditional
formatting with that color. To make favorable variances appear in blue use the
color hex value #4E79A7 in the ELSE part of the formula. Remember: More
cost is unfavorable and more profit is favorable, so pay attention to the signs.
TIP: Straight quotation marks must be used as shown below. Curly quotation
marks, also known as smart quotes, will not work in Power BI formulas.
a. Favorable DM = IF([Actual DM Cost]>[Budgeted DM
Cost],"#F28E2B","#4E79A7")
b. Favorable DL = IF([Actual DL Cost]>[Budgeted DL
Cost],"#F28E2B","#4E79A7")
c. Favorable OH = IF([Actual OH Cost]>[Budgeted OH
Cost],"#F28E2B","#4E79A7")
d. Favorable Profit = IF([Actual Profit]<[Budgeted Profit],"#F28E2B",
"#4E79A7")
6. Finally when evaluating the jobs individually, you should compare the profit
margin to a target. This requires two more measures:
a. Target Profit Margin = .20

357

ISTUDY ric44907_ch07_334-403.indd 357 03/02/23 10:16 AM


Rev. Confirming Pages

b. Favorable Profit Margin = IF([Actual Profit Margin]<=[Target Profit


Margin],"#F28E2B","#4E79A7")
7. Scroll your field list to show your new calculated values and take a
screenshot (label it 7-1MA). Note: Your report should still be blank at this
point.
8. Save your file as Lab 7-1 Slainte Job Costs.pbix. Answer the questions for
this part and then continue to the next part.

Tableau | Desktop

1. Open Tableau Desktop and connect to your data:


a. Click Connect to Data > Microsoft Excel.
b. Browse to find the Lab 7-1 Slainte Job Costs.xlsx file and click Open.
c. Drag the Job_Orders table to the data model panel, then connect the
Customers, Time_Record, Material_Requisition, and Job_Rates tables to
right of it.
d. Finally, drag the Employees table to the right of the Time_Record
table.
2. Rename Sheet 1 to Job Composition and edit the following data types:
a. For all Job No fields (3 total), right-click each and choose Convert to
Dimension.
3. Next, create calculated fields for the following category subtotals that you
will include in your graphs and tables. Click Analysis > Create Calculated
Field and create each of the following six fields (name: formula):
a. Actual Revenue: SUM([Job Revenue])
b. Actual DM Cost: SUM([Material Cost])
c. Actual DL Cost: SUM([Hours]*[Direct Labor Rate])
d. Actual OH Cost: SUM([Hours]*[Overhead Rate])
e. Actual Profit: [Actual Revenue]-[Actual DL Cost]-[Actual DM Cost]-
[Actual OH Cost]
f. Actual Profit Margin: [Actual Profit]/[Actual Revenue]
4. To enable benchmarks for comparison, add the following calculated fields
for the budgeted amounts:
a. Budgeted DM Cost = SUM([Job Budgeted DM Cost])
b. Budgeted DL Cost = SUM([Job Budgeted Hours])*SUM([Direct_
Labor_Rate])
c. Budgeted OH Cost = SUM([Job Budgeted Hours])*SUM([Overhead
Rate])
d. Budgeted Profit = [Actual Revenue]-[Budgeted DL Cost]-[Budgeted DM
Cost]-[Budgeted OH Cost]
5. To enable the use of color on your graphs to show favorable and unfavorable
analyses, create some additional calculated fields based on IF . . . THEN . . .

358

ISTUDY ric44907_ch07_334-403.indd 358 02/28/23 07:53 AM


logic. Remember: More cost is unfavorable and more profit is favorable, so
pay attention to the signs.
a. Favorable DM = IF(([Budgeted DM Cost]-[Actual DM Cost])>0) THEN
‘Favorable’ ELSE ‘Unfavorable’ END
b. Favorable DL = IF(([Budgeted DL Cost]-[Actual DL Cost])>0) THEN
‘Favorable’ ELSE ‘Unfavorable’ END
c. Favorable OH = IF(([Budgeted OH Cost]-[Actual OH Cost])>0) THEN
‘Favorable’ ELSE ‘Unfavorable’ END
d. Favorable Profit = IF(([Budgeted Profit]-[Actual Profit])<0) THEN
‘Favorable’ ELSE ‘Unfavorable’ END
6. Finally when evaluating the jobs individually, you should compare the profit
margin to a target. This requires a new parameter as well as one last calcu-
lated field:
a. In the data tab, click the drop-down arrow next to the search box and
choose Create Parameter. Enter the following and click OK.
1. Name: Target Profit Margin
2. Current Value: .20
b. Add one final calculated field: Favorable Profit Margin = IF([Actual
Profit Margin]>=[Target Profit Margin]) THEN ‘Favorable’ ELSE
‘Unfavorable’ END
7. Scroll to the bottom of your field list to see your new calculated values and
take a screenshot (label it 7-1TA).
8. Save your file as Lab 7-1 Slainte Job Costs.twb. Answer the questions for this
part and then continue to the next part.

Lab 7-1 Part 1 Objective Questions (LO 7-2)


OQ1. How much profit as a percentage of revenue does management hope to make
from each product?
OQ2. What is the calculation for the budgeted direct labor cost?
OQ3. When is a cost favorable?

Lab 7-1 Part 1 Analysis Questions (LO 7-2)


AQ1. Why do we need to set up calculated fields and measures before we can create a
visualization?
AQ2. What is the purpose of an IF . . . THEN . . . statement for this lab?
AQ3. Which other measures might be useful for analyzing job cost variance?

Lab 7-1 Part 2 Identify the Composition of Each Job’s


Revenue
With the calculations prepared, you are now ready to create some visualizations to help you
understand the different components that make up each job’s revenue. In this first visual,
you will show each job in a bar chart with the different cost elements, such as direct materi-
als, direct labor, overhead, and profit. This will help you visually explain what is driving the
profitability of each job.

359

ISTUDY
Rev. Confirming Pages

Microsoft | Power BI Desktop

1. Open your Lab 7-1 Slainte Job Costs.pbix file created in Part 1 and go to the
Job Composition tab.
2. Add a new Stacked Bar Chart to your page and resize it so it fills the entire
page. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
a. Y-axis: Job_Orders.Job_No, Job_Rates.Job_Type
b. X-axis (all from Job_Orders): Actual Profit, Actual DM Cost, Actual DL
Cost, Actual OH Cost
3. Note: At this point you will see the bar chart, but the values will be incorrect.
Does it make sense that half of the jobs have negative profit? Power BI will
try to add all of the values from linked tables, in this case the Direct_Labor_
Rate and Overhead_Rate, unless you add a field from that table and have it
appear on your visual. Fix the problem by clicking Expand all down one level
in the hierarchy (down arrow fork icon) in the top-right corner of your chart.
This will now show the Job_Type value next to the job number and use the
correct rates for your DL and OH calculations. You should now see that
most of the jobs show a positive profit.
4. Click the Format visual (paintbrush) icon to give your chart some friendly
titles:
a. General > Title > Text: Job Cost Composition
b. Visual > Y-axis > Title: Job Number and Type
c. Visual > X-axis > Title: Off
5. Take a screenshot (label it 7-1MB) of your Job Composition page.
6. Save your file as Lab 7-1 Slainte Job Costs.pbix. Answer the questions for
this part and then continue to the next part.

Tableau | Desktop

1. Open your Lab 7-1 Slainte Job Costs.twb file created in Part 1 and go to the
Job Composition tab.
2. Create a new Stacked Bar Chart to your page. Drag the following fields from
the Job_Orders and Job_Rates tables to their respective boxes in the Visual-
izations pane:
a. Rows: Job_Orders.Job_No, Job_Rates.Job_Type
b. Columns: Measure Values (at the very bottom of the field list)
c. Marks > Color: Measure Names
d. Filters: Measure Values
1. Right-click the Measure Values filter and choose Edit Filter.
2. Click None to uncheck all of the values in this list.

360

ISTUDY ric44907_ch07_334-403.indd 360 02/20/23 01:54 PM


Rev. Confirming Pages

3. Check the following and click OK:


a. Actual DL Cost
b. Actual DM Cost
c. Actual OH Cost
d. Actual Profit
e. Sort by: Value (Descending)
3. Take a screenshot (label it 7-1TB) of your Job Composition page.
4. Save your file as Lab 7-1 Slainte Job Costs.twb. Answer the questions for this
part and then continue to the next part.

Lab 7-1 Part 2 Objective Questions (LO 7-2)


OQ1. Which job has the highest revenue?
OQ2. Which job has negative profit?
OQ3. Which job has the highest profit?
OQ4. On average, do the Premium jobs make more profit or less profit than the Stan-
dard jobs?
OQ5. On average, what is the largest revenue component measure?

Lab 7-1 Part 2 Analysis Questions (LO 7-2)


AQ1. Why don’t we include revenue in this visualization?
AQ2. What additional information might a manager want to add to complement this
visual?

Lab 7-1 Part 3 Evaluate Variances in Different Cost


Components
In Part 1, you calculated the budgeted values for each of the different cost components.
Now you are tasked with building a dashboard to understand how closely actual results
match the budgeted amounts and which components (DM, DL, or OH) are driving the
variance and thus affecting the profitability of each job. This will help Sláinte management
better control costs on future special orders.

Microsoft | Power BI Desktop

1. Open your Lab 7-1 Slainte Job Costs.pbix file from Part 2 and create a new
page called Job Cost Dashboard.
2. Add a new Matrix Table to your page and resize it so it fills the entire width
of your page and the top third.
a. Click the Build visual icon in the Visualizations pane and drag the fol-
lowing fields from the Job_Orders and Job_Rates tables:

361

ISTUDY ric44907_ch07_334-403.indd 361 02/11/23 01:04 PM


Rev. Confirming Pages

1. Columns: Job_Orders.Job_No
2. Values: Job_Rates.Job_Type. Job_Orders.Actual Revenue, Job_Or-
ders.Actual DM Cost, Job_Orders.Actual DL Cost, Job_Orders.
Actual OH Cost, Job_Orders.Actual Profit, Job_Orders.Actual Profit
Margin
b. Now change the format so that attributes appear on rows instead of as
headers. Click the Format visual (paintbrush) icon in the Visualizations
pane and adjust the following:
1. Visual > Values > Options > Switch values to rows: On
c. Click the Format visual (paintbrush) icon to change the text color to
show favorable (blue) and unfavorable (orange) profit margin values:
1. Visual > Cell elements > Apply settings to: Actual Profit Margin >
Font color: On
2. Click Conditional formatting (fx), enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Favorable Profit Margin
3. The table should now show orange profit margin values for any value
below the 20% you defined in Part 1.
d. Take a screenshot (label it 7-1MC) of your Job Cost table.
3. Click on the blank part of the page and add a new Clustered Bar Chart below
your table. Resize it so it fits the top-left quarter of the remaining space.
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
1. Y-axis: Job_Orders.Job_No
2. X-axis (all from Job_Orders): Actual DM Cost
3. Tooltips: Budgeted DM Cost
b. Click the Format visual (paintbrush) icon to give your chart some
friendly titles and color based on whether the value is favorable (blue) or
unfavorable (orange):
1. General > Title > Text: Direct Material Cost
2. Visual > Y-axis > Title: Job Number
3. X-axis > Title: Off
4. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following and click OK:
a. Format by: Field value
b. What field should we base this on?: Favorable DM
c. Take a screenshot (label it 7-1MD) of your dashboard with the table
and bar chart.
4. Click on the blank part of the page and add a new Clustered Bar Chart to
the right of your Direct Material Cost chart. Resize it so it fits the top-right
quarter of the remaining space.
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:

362

ISTUDY ric44907_ch07_334-403.indd 362 02/14/23 01:06 PM


Rev. Confirming Pages

1. Y-axis: Job_Orders.Job_No, Job_Rates.Job_Type


2. X-axis (all from Job_Orders): Actual DL Cost
3. Tooltips: Budgeted DL Cost
b. Click Expand all down one level in the hierarchy (down arrow fork icon)
in the top-right corner of your chart.
c. Click the Format visual (paintbrush) icon to give your chart some
friendly titles and color based on whether the value is favorable (blue) or
unfavorable (orange):
1. General > Title > Text: Direct Labor Cost
2. Visual > Y-axis > Title: Job Number
3. Visual > X-axis > Title: Off
4. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following, and click OK:
a. Format by: Field value
b. What field should we base this on?: Favorable DL
5. Click on the blank part of the page and add a new Clustered Bar Chart to
the right of your Direct Material Cost chart. Resize it so it fits the top-right
quarter of the remaining space.
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
1. Y-axis: Job_Orders.Job_No, Job_Rates.Job_Type
2. X-axis (all from Job_Orders): Actual OH Cost
3. Tooltips: Budgeted OH Cost
b. Click Expand all down one level in the hierarchy (down arrow fork icon)
in the top-right corner of your chart.
c. Click the Format visual (paintbrush) icon to give your chart some
friendly titles and color based on whether the value is favorable (blue) or
unfavorable (­orange):
1. General > Title > Text: Overhead Cost
2. Visual > Title: Job Number
3. Visual > X-axis > Title: Off
4. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following, and click OK:
a. Format by: Field value
b. What field should we base this on?: Favorable OH
6. Click on the blank part of the page and add a new Clustered Bar Chart to
the right of your Direct Material Cost chart. Resize it so it fits the top-right
quarter of the remaining space.
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes in the Visualizations pane:
1. Y-axis: Job_Orders.Job_No, Job_Rates.Job_Type
2. X-axis (all from Job_Orders): Actual Profit
3. Tooltips: Budgeted Profit
b. Click Expand all down one level in the hierarchy (down arrow fork icon)
in the top-right corner of your chart.

363

ISTUDY ric44907_ch07_334-403.indd 363 02/20/23 01:56 PM


Rev. Confirming Pages

c. Click the Format visual (paintbrush) icon to give your chart some
friendly titles and color based on whether the value is favorable (blue) or
unfavorable (orange):
1. General > Title > Text: Profit
2. Visual > Y-axis > Title: Job Number
3. Visual > X-axis > Title: Off
4. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following, and click OK:
a. Format by: Field value
b. What field should we base this on?: Favorable Profit
d. Take a screenshot (label it 7-1ME) of your completed dashboard.
7. Save your file as Lab 7-1 Slainte Job Costs.pbix. Answer the questions for
this part and then close Power BI Desktop.

Tableau | Desktop

1. Open your Lab 7-1 Slainte Job Costs.twb file from Part 2 and create a new
Page called Job Cost.
2. Create a Summary Table to show your costs associated with each job:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Measure Names
2. Rows: Job_Orders.Job No, Job_Rates.Job Type
3. Marks > Color: Favorable Profit Margin
4. Marks > Text:Measure Values
5. Filters: Measure Values
a. Right-click the Measure Values filter and choose Edit Filter.
b. Click None to uncheck all of the values in this list.
c. Check the following and click OK:
i. Actual DL Cost
ii. Actual DM Cost
iii. Actual OH Cost
iv. Actual Profit
v. Actual Profit Margin
vi. Actual Revenue
6. To show the profit margin as a percentage, right-click the
AGG(Actual Profit Margin) pill in the Measure Values shelf and
choose Format number. Click Percentage and then click back onto
your sheet.
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.

364

ISTUDY ric44907_ch07_334-403.indd 364 02/20/23 01:32 PM


1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Take a screenshot (label it 7-1TC) of your Job Cost table.
3. Create a new worksheet called Direct Material Cost and add the following:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Actual DM Cost
2. Rows: Job_Orders.Job No
3. Marks > Detail:Budgeted DM Cost
4. Marks > Color: Favorable DM
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Cell button your chart.
3. Set the line Value to Budgeted DM Cost.
d. Take a screenshot (label it 7-1TD) of your Direct Materials Cost
worksheet.
4. Create a new worksheet called Direct Labor Cost and add the following:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Actual DL Cost
2. Rows: Job_Orders.Job No
3. Marks > Detail:Budgeted DL Cost
4. Marks > Color: Favorable DL
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Cell button your chart.
3. Set the line Value to Budgeted DL Cost.

365

ISTUDY
5. Create a new worksheet called Overhead Cost and add the following:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Actual OH Cost
2. Rows: Job_Orders.Job No
3. Marks > Detail:Budgeted OH Cost
4. Marks > Color: Favorable OH
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Cell button your chart.
3. Set the line Value to Budgeted OH Cost.
6. Create a new worksheet called Profit and add the following:
a. Drag the following fields from the Job_Orders and Job_Rates tables to
their respective boxes:
1. Columns: Actual Profit
2. Rows: Job_Orders.Job No
3. Marks > Detail:Budgeted Profit
4. Marks > Color: Favorable Profit
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Cell button your chart.
3. Set the line Value to Budgeted Profit.
7. Finally, create a new dashboard tab called Job Cost Dashboard and add your
charts from this part of the lab:
a. Change the size from Desktop Browser > Fixed Size to Automatic.
b. Drag the Direct Material Cost sheet to the dashboard.
c. Drag the Direct Labor Cost sheet to the right side of the dashboard.
d. Drag the Overhead Cost sheet to the bottom-left corner of the dash-
board.

366

ISTUDY
e. Drag the Profit sheet to the bottom-right corner of the dashboard.
f. Drag the Job Cost sheet along the entire top of the dashboard and resize
it to remove extra space.
g. In the top-right corner of each sheet on your new dashboard, click Use
as Filter (funnel icon) to connect the visualizations so you can drill
down into the data.
h. Take a screenshot (label it 7-1TE) of your completed dashboard.
8. Save your file as Lab 7-1 Slainte Job Costs.twb. Answer the questions for this
part and then close Tableau Desktop.

Lab 7-1 Part 3 Objective Questions (LO 7-2)


OQ1. How many jobs underperformed on profit margin?
OQ2. How many jobs underperformed on budgeted profit?
OQ3. Is the job with the highest profit margin Premium or Standard?
OQ4. Are there more, fewer, or the same number of jobs with a favorable direct labor
cost and overhead cost?

Lab 7-1 Part 3 Analysis Questions (LO 7-2)


AQ1. What do you notice about the variances for direct labor cost and overhead cost?
What might explain the similarities or differences?
AQ2. Look at Job 3217 and comment on the cost and profit variances. What might
explain these variances?
AQ3. For the job(s) with negative profit, are all variances unfavorable? What is (are)
the biggest driver(s) for the loss?

Lab 7-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 7-2 Create a Balanced Scorecard Dashboard—Sláinte


Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As Sláinte’s management evaluates different aspects of their business,
they are increasingly interested in identifying areas where they can track performance and
quickly determine where improvement should be made. The managers have asked you to
develop a dashboard to help track different dimensions of the Balanced Scorecard includ-
ing Financial, Process, Growth, and Customer measures.
Data: Lab 7-2 Slainte Sales.zip - 128K Zip / 130K Excel

Lab 7-2 Example Output


By the end of this lab, you will create a dashboard that will let you explore four Balanced
Scorecard key performance indicators. While your results will include different data values,
your work should look similar to this:

367

ISTUDY
Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 7-2M Example Balanced Scorecard Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 7-2T Example Balanced Scorecard Dashboard in Tableau Desktop

368

ISTUDY
Lab 7-2 Part 1 Identify KPI Targets and Colors
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-2 [Your name] [Your email address].docx.
The dashboard you will prepare in this lab requires a little preparation so you can define
key performance indicator targets and create some benchmarks for your evaluation. Once
you have these in place, you can create your visualizations. To simplify the process, here are
four KPIs that management has identified as high priorities:
Finance: Which products provide the highest amount of profit? The goal is 13 percent
return on sales. Use Profit ratio = Total profit/Total sales.
Process: How long does it take to ship our product to each state on average? Manage-
ment would like to see five days or less. Use Delivery time in days = Ship date − Order date.
Customers: Which customers spend the most on average? Management would like to
make sure those customers are satisfied. Average sales amount by average transaction count.
Employees: Who are our top-performing employees by sales each month? Rank the total
number of sales by employee.

Microsoft | Power BI Desktop

1. Open Power BI Desktop and connect to your data:


a. Click Home > Get Data > Excel.
b. Browse to find the Lab 7-2 Slainte Sales.xlsx file and click Load.
c. Check all of the tables and click Load. Refer to Appendix K for the data
dictionary and UML diagram.
d. Click the Model button in the toolbar on the left to view the UML
diagram and drag Employee_Listing.Employee_ID to Sales_Order.Sales_
Employee_ID.
e. Click the Report button in the toolbar on the left to return to your
report.
2. Rename Sheet 1 to Balanced Scorecard.
3. Next, create calculated fields for the following items that you will need
for your evaluation in your graphs and tables. The Profit Ratio is simply
profit divided by sales. The Deliver Time Days calculates the number of
days between the order date and the ship date for each order. The Rank
identifies the top salespeople based on sales. Click the Sales_Order_Lines
table in the fields list to make that the home of your new measures, then
click ­Modeling > New Measure. Enter each of the formulas below as a new
measure. Note: If the measure shows up in an unintended table, click the
measure and change the Home Table in the ribbon to Sales_Order_Lines.
a. Profit Ratio = SUM(Sales_Order_Lines[Product_Line_Profit])/
SUM(Sales_Order_Lines [Product_Line_Total_Revenue])
b. Delivery Time Days = DATEDIFF(MIN(Sales_Order[Sales_Order_
Date]), MIN (Sales_Order[Ship_Date]), DAY)
c. Rank = RANKX(ALL(Employee_Listing),CALCULATE(SUM(Sales_­
Order_Lines [Product_Line_Total_Revenue])))

369

ISTUDY
4. To enable KPI targets to use as benchmarks for comparison, add the fol-
lowing measures. This will set your expected return on sales to 13 percent,
delivery days to 5, top salespeople to show the first top-ranking individual,
and the target average revenue to $600. You can always edit these measures
later to change the benchmarks:
a. KPI Target Return on Sales = .13
b. KPI Target Delivery Days = 5
c. KPI Target Top Salespeople = 1
d. KPI Target Average Revenue = 600
5. To enable the use of color on your graphs to show KPIs that exceed the target,
create some additional calculated fields based on IF . . . THEN . . . logic. To
make values that miss the target appear in orange (#F28E2B) use the color
hex value in the THEN part of the formula. Power BI will apply conditional
formatting with that color. To make values that exceed the target variances
appear in blue (#4E79A7) use the color hex value in the ELSE part of the
formula. Remember: Sometimes smaller values are better than larger values
so watch the signs.
a. Actual vs Target Return on Sales = IF([Profit Ratio]>=[KPI Target
Return on Sales],“#4e79a7”,“#f28e2b”)
b. Actual vs Target Delivery Days = IF([Delivery Time Days]<=[KPI Target
Delivery Days],“#4e79a7”,“#f28e2b”)
c. Actual vs Target Seller = IF([Rank]=[KPI Target Top Salespeople],“#4e
79a7”,“#f28e2b”)
d. Actual vs Target Average Sale = IF(AVERAGE(Sales_Order_Lines
[­Product_Line_Total_Revenue])>=[KPI Target Average Revenue],“#4e7
9a7”,“#f28e2b”)
6. Scroll your field list to show your new calculated values and take a
screenshot (label it 7-2MA). Note: Your report should still be blank at this
point.
7. Save your file as Lab 7-2 Slainte Balanced Scorecard.pbix. Answer the ques-
tions for this part and then continue to the next part.

Tableau | Desktop

1. Open Tableau Desktop and connect to your data:


a. Click Connect to Data > Microsoft Excel.
b. Browse to find the Lab 7-2 Slainte Sales.xlsx file and click Open.
c. Drag the Sales_Order table to the data model panel, then connect the
Customer_Master_Listing, Employee_Listing (match Sales_Order.Sales
Employee ID with Employee_Listing.Employee ID), and Sales_Order_
Lines tables to right of it. Refer to Appendix K for the data dictionary
and UML diagram.
d. Finally, drag Finished_Good_Products to the right of Sales_Order_
Lines.

370

ISTUDY
2. Rename Sheet 1 to Finance - Return on Sales.
3. Next, create calculated fields for the following category subtotals that you
will include in your graphs and tables. The Profit Ratio is simply profit
divided by sales. The Deliver Time Days calculates the number of days
between the order date and the ship date for each order. The Rank identifies
the top salespeople based on sales. Click Analysis > Create Calculated Field
for each of the following (name: formula):
a. Profit Ratio: SUM([Product Line Profit])/SUM([Product Line Total
Revenue])
b. Delivery Time Days: DATEDIFF(‘day’,[Sales Order Date], [Ship Date])
c. Rank: INDEX()
4. To enable KPI targets to use as benchmarks for comparison, add the follow-
ing measures. Click the down arrow at the top of the data tab and choose
Create Parameter. Enter the name and the current value below and click
OK. This will set your expected return on sales to 13 percent, delivery days
to 5, top salespeople to show the first top-ranking individual, and the target
average revenue to $600. You can always edit these measures later to change
the benchmarks:
a. KPI Target Return on Sales = .13
b. KPI Target Delivery Days = 5
c. KPI Target Top Salespeople = 1
d. KPI Target Average Revenue = 600
5. To enable the use of color on your graphs to show favorable and unfavorable
analyses, create some additional calculated fields based on IF . . . THEN . . .
logic. Remember: More cost is unfavorable and more profit is favorable, so
pay attention to the signs.
a. Actual vs Target Return on Sales = IF([Profit Ratio]>=[KPI Target
Return on Sales]) THEN ‘Favorable’ ELSE ‘Unfavorable’ END
b. Actual vs Target Delivery Days = IF(AVG([Delivery Time Days])<=[KPI
Target Delivery Days]) THEN ‘Favorable’ ELSE ‘Unfavorable’ END
c. Actual vs Target Seller = IF([Rank]=[KPI Target Top Salespeople])
THEN ‘Favorable’ ELSE ‘Unfavorable’ END
d. Actual vs Target Average Sale = IF(AVG([Product Line Total
Revenue])>=[KPI Target Average Revenue]) THEN ‘Favorable’ ELSE
‘Unfavorable’ END
6. Scroll your field list to show your new calculated values and take a
screenshot (label it 7-2TA). Note: Your report should still be blank at this
point.
7. Save your file as Lab 7-2 Slainte Balanced Scorecard.twb. Answer the
­questions for this part and then continue to the next part.

Lab 7-2 Part 1 Objective Questions (LO 7-2, 7-3)


OQ1. What is management’s expected profit margin for each product?
OQ2. What is the average revenue threshold for our best customers?
OQ3. What color should you use to show favorable metrics?

371

ISTUDY
Rev. Confirming Pages

Lab 7-2 Part 1 Analysis Questions (LO 7-2, 7-3)


AQ1. What KPIs would you consider using to evaluate sales financial performance?
AQ2. What KPIs would you consider using to evaluate process efficiency?
AQ3. What KPIs would you consider using to evaluate employee growth?
AQ4. What KPIs would you consider using to evaluate customer relationships?
AQ5. For each KPI, identify a benchmark value or KPI goal that you think manage-
ment might use.
AQ6. Using the available fields, identify some calculations or relationships that would
support your KPIs from AQ1 to AQ4?
AQ7. Are there any KPIs you selected that don’t have supporting data fields?

Lab 7-2 Part 2 Balanced Scorecard Dashboard


Now you can begin putting together your dashboard. Management would like to evaluate
each of these dimensions by month to see how they are performing. This means you should
use a filter or slicer to enable quick selection of a given month.

Microsoft | Power BI Desktop

1. Open your Lab 7-2 Slainte Balanced Scorecard.pbix file created in Part 1 and
go to the Balanced Scorecard tab.
2. Add a new Slicer to your page and resize it so it fills a narrow column on the
far right side of the page.
a. Expand the Sales_Order table and check the Sales_Order_Date field to
add it to the slicer.
b. Check only 2020 > Qtr 1 > February 2020 in the slicer to filter the data.
3. Click on the blank part of the page and add a new Clustered Column Chart
to the page for your Finance visual. Resize it so that it fills the top-left quar-
ter of the remaining space on the page. Note: Be sure to click in the blank
space before adding a new element.
a. Drag the following fields from the Finished_Good_Products and Sales_
Order_Lines tables to their respective boxes:
1. X-axis: Finished_Good_Products.Product_Description
2. Y-axis: Profit Ratio
b. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > X-axis > Title: Off
2. Visual > Y-axis > Title: Profit Ratio
3. Visual > Columns > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Return on Sales
4. General > Title > Text: Finance - Return on Sales

372

ISTUDY ric44907_ch07_334-403.indd 372 02/20/23 01:34 PM


Rev. Confirming Pages

4. Click on the blank part of the page and add a new Filled Map to the page
for your Process visual. Resize it so that it fills the top-right quarter of the
remaining space on the page.
a. Drag the following fields from the Customer_Master_Listing and Sales_
Order_Lines tables to their respective boxes:
1. Location: Customer_Master_Listing.Customer_State
2. Tooltips: Delivery Time Days
b. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > Fill colors > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Delivery
Days
2. General > Title > Text: Process - Delivery Time
c. Take a screenshot (label it 7-2MB) of your finance and process
visuals.
5. Click on the blank part of the page and add a new Clustered Bar Chart to the
page for your Growth visual. Resize it so that it fills the bottom-left quarter
of the remaining space on the page.
a. Drag the following fields from the Employee_Listing and Sales_Order_
Lines tables to their respective boxes:
1. Y-axis: Employee_Listing.Employee_First_Name and Employee_List-
ing.Employee_Last_Name
2. X-axis: Sales_Order_Lines.Product_Line_Total_Revenue
b. To show both names, click Expand all down one level in the hierarchy
(down arrow fork icon at the top of the chart).
c. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > Y-axis > Title: Salesperson
2. Visual > X-axis > Title: Sales Revenue
3. Visual > Bars > Colors > Conditional formatting (fx button), enter
the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Seller
4. General > Title > Text: Growth - Top Salesperson
6. Click on the blank part of the page and add a new Scatter Chart to the page
for your Customer visual. Resize it so that it fills the bottom-right quarter of
the remaining space on the page.
a. Drag the following fields from the Customer_Master_Listing and Sales_
Order_Lines tables to their respective boxes:
1. Values: Customer_Master_Listing.Business_Name
2. X-axis: Sales_Order_Lines.Product_Line_Total_Revenue > Average
3. Y-axis: Sales_Order_Lines.Sales_Order_Quantity_Sold > Average

373

ISTUDY ric44907_ch07_334-403.indd 373 02/20/23 01:35 PM


Rev. Confirming Pages

b. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Format > X-axis > Title: Average Order Revenue
2. Format > Y-axis > Title: Average Order Quantity
3. Format > Markers > Color > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Actual vs Target Average Sale
4. General > Title > Text: Customer - Best Customers
c. Take a screenshot (label it 7-2MC) of your completed dashboard.
7. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-2 Slainte Balanced Scorecard.pbix.

Tableau | Desktop

1. Open your Lab 7-2 Slainte Balanced Scorecard.twb file from Part 1 and go to
the Finance - Return on Sales sheet. Add the following:
a. Drag the following fields from the Finished_Good_Products and Sales_
Order_Lines tables to their respective boxes:
1. Columns: Finished_Good_Products.Product Description
2. Rows: Profit Ratio
3. Marks > Color: Actual vs Target Return on Sales
4. Marks > Detail: KPI Target Return on Sales
5. Sort by: Profit Ratio > Descending
6. Filters: Sales Order Date > Month / Year > February 2020 > OK
a. Right-click > Show filter.
b. Right-click > Apply to Worksheets > All Using This Data Source.
b. Click the drop-down menu in the top-right corner of the Favorable/­
Unfavorable legend on the right side of the screen and choose Edit
Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
c. Optional: Add a reference line to show the budgeted cost as a bench-
mark value on your chart:
1. Click the Analytics tab.
2. Drag Reference Line to the Table button your chart.
3. Set the line Value to KPI Target Return on Sales.
d. Take a screenshot (label it 7-2TB) of your Direct Materials Finance
worksheet.

374

ISTUDY ric44907_ch07_334-403.indd 374 02/20/23 01:36 PM


2. Create a new worksheet called Process - Delivery Time and add the following:
a. Drag the following fields from the Customer_Master_Listing and Sales_
Order tables to their respective boxes. Note: The Longitude and Latitude
attributes are found near the bottom under Measure Names:
1. Columns: Longitude (generated)
2. Rows: Latitude (generated)
3. Marks > Detail: Customer_Master_Listing.Customer State
4. Marks > Detail: Sales_Order.Delivery Time Days > Measure > Average
5. Marks > Color: Actual vs Target Delivery Days
6. Marks > Type: Map
b. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
3. Create a new worksheet called Growth-Top Salespeople and add the
following:
a. Drag the following fields from the Employee_Listing and Sales_Order_
Lines tables to their respective boxes:
1. Rows: Employee_Listing.Employee_First Name, Employee_Listing.
Employee_Last Name
2. Columns: Sales_Order_Lines.Product Line Total Revenue
3. Marks > Color: Actual vs Target Seller
4. Sort by: Product Line Total Revenue > Descending
b. Note: At this point, the values will show jobs with favorable (blue)
and unfavorable (orange) profit margins compared to the Target Profit
Margin of 20 percent that you created in Part 1. When you filter values
on your dashboard later the colors may change automatically to blue if
there is only one value. Fix this problem by clicking the drop-down menu
in the top-right corner of the Favorable/Unfavorable legend on the right
side of the screen and choose Edit Colors. Most of the jobs should now
show a positive profit.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
4. Create a new worksheet called Customer-Best Customers and add the
following:
a. Drag the following fields from the Customer_Master_Listing and Sales_
Order_Lines tables to their respective boxes:
1. Columns: Sales_Order_Lines.Product Line Total Revenue >
­Measure > Average
2. Rows: Sales_Order_Lines.Sales Order Quantity Sold > Measure >
Average

375

ISTUDY
3. Marks > Details: Customer_Master_Listing.Business Name
4. Marks > Color: Actual vs Target Average Sale
b. To center your chart, right-click both axes and choose Edit Axis, then
uncheck Include zero.
c. Click the drop-down menu in the top-right corner of the Favorable/Unfa-
vorable legend on the right side of the screen and choose Edit Colors.
1. Click Favorable and then click the Blue square.
2. Click Unfavorable and then click the Orange square.
3. Click OK to return to your table. This will force the color assignment
to each value for this chart.
5. Finally, create a new dashboard tab called Balanced Scorecard and add your
charts from this part of the lab:
a. Change the size from Desktop Browser > Fixed Size to Automatic.
b. Drag the Finance sheet to the dashboard.
c. Drag the Process sheet to the right side of the dashboard.
d. Drag the Growth sheet to the bottom-left corner of the dashboard.
e. Drag the Customer sheet to the bottom-right corner of the dashboard.
f. In the top-right corner of each sheet on your new dashboard, click Use
as Filter (funnel icon) to connect the visualizations so you can drill
down into the data.
6. Take a screenshot (label it 7-2TC) of your completed dashboard.
7. When you are finished answering the lab questions you may close Tableau
Desktop. Save your file as Lab 7-2 Slainte Balanced Scorecard.twb.

Lab 7-2 Part 2 Objective Questions (LO 7-2, 7-3)


OQ1. Which product(s) has (have) fallen below the profit goal of 13 percent in
February 2020?
OQ2. Which state takes the most time to ship to in February 2020?
OQ3. Which salesperson is leading the rest for overall sales in February 2020?
OQ4. Which customer is our best customer in February 2020?

Lab 7-2 Part 2 Analysis Questions (LO 7-2, 7-3)


AQ1. How does the balanced scorecard dashboard help management quickly identify
issues?
AQ2. What category of visualizations did you produce in this lab?
AQ3. What additional visualizations might you include on your dashboard?

Lab 7-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

376

ISTUDY
Lab 7-3  omprehensive Case: Analyze Time Series
C
Data—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Time series analysis is a form of predictive analysis to forecast sales.
The goal is to identify trends, seasonality, or cycles in the data so that you can better plan
production.
In this lab, you will create two dashboards to identify trends and seasonality in Dillard’s
Sales data. The first dashboard will focus on monthly seasonality and the second dashboard
will focus on day of the week seasonality. Both dashboards should show the same general
trend.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 7-3 Example Output


By the end of this lab, you will create two dashboards to explore time series data. While
your results will include different data values, your work should look similar to this:

Microsoft | Excel

Microsoft Excel
LAB 7-3M Example Time Series Dashboard in Microsoft Excel

377

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 7-3T Example Time Series Dashboard in Tableau Desktop

Lab 7-3 Part 1 Create a Highlight Table and Cycle Plot


(Tableau), Sparklines (Excel)
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-3 [Your name] [Your email address].docx.
Keep in mind that for the Microsoft Excel track you will need to zoom out in order to cap-
ture the entirety of your dashboard with the slicers included.
In this part of the lab you will connect to the Dillard’s database via a SQL query and
then create visualizations to assess Dillard’s sales performance across each month and year.
Because we are analyzing data from all three years stored in the Dillard’s database (2014,
2015, and a portion of 2016), we will be able to begin our analysis more quickly if we use a
SQL query instead of connecting to the entire tables. The query limits the amount of data
we’re working with to only the attributes necessary (State, Store, Transaction Amount for
Sales, and Transaction Date) and groups the Transaction Amount by day instead of pulling
in each individual transaction amount. The result of the analysis is the same as if we were to
connect to the entire tables, but it is faster.

Microsoft | Excel

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From Database > From SQL Server Database.

378

ISTUDY
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT TRAN_DATE, STATE, STORE.STORE, SUM(TRAN_AMT)
AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE TRAN_TYPE = ‘P’
GROUP BY TRAN_DATE, STATE, STORE.STORE
3. Click OK, then Load to load the data directly into Excel. This may take a few
minutes because the dataset you are loading is large.
4. Once the data have loaded, you will add the data to the Power Pivot data
model. Power Pivot is an Excel add-in that allows us to supercharge our data
models and PivotTables by creating relationships in Excel (like we can do in
Power BI), improve our Conditional Formatting by creating KPIs with base
and target measures, and create date tables to work better with time series
analysis, among many other things.
a. Enable Power Pivot by navigating to File > Options > Add-ins.
1. Change the dropdown for Manage: to COM Add-ins.
2. Click Go. . .
3. Place a check mark next to Microsoft Power Pivot for Excel and
click OK.
5. From the new Power Pivot tab on the ribbon, select Add to Data Model
(ensure your active cell is somewhere in the dataset you just loaded into
Excel). In the Power Pivot window, you will create a date table and then load
your data to a PivotTable:
a. Create the date table: Design tab > Date Table > New.
b. Load to PivotTable: Home tab > PivotTable.
c. Click OK in the Create PivotTable pop-up window to add the PivotTable
to a new worksheet.
6. Take a few moments to explore the new PivotTable field list. There are two
tables to expand.
a. Calendar is your new date table. It has a date hierarchy and also many
different date parts in the More Fields section. We will work with the
Date Hierarchy as well as the Year, Month, and Day of Week fields from
the More Fields section.
b. Query1 is the data you loaded from SQL Server. We will primarily work
with the AMOUNT, STATE, and STORE fields.
7. Expand the Calendar and More Fields section as well as the Query1 fields
and take a screenshot (label it 7-3MA).
8. Create a highlight table (PivotTable with conditional formatting) to visualize
previous month and prior month comparisons:
a. Create the PivotTable:
1. Columns: Calendar.Year (Expand More Fields to find Year)

379

ISTUDY
2. Rows: Calendar.Month (Expand More Fields to find Month)
3. Values: Query1.AMOUNT
4. You will likely be prompted that relationships between tables need to
be created. Click Auto-Detect, then Close the Auto-Detect Relation-
ships window.
5. Remove the Grand Totals: PivotTable Design tab > Grand Totals >
Off for Rows and Columns.
b. Add in Conditional Formatting:
1. Select all of the numerical data in your PivotTable (not the year
labels or month labels).
2. From the Home tab in the ribbon, select Conditional Formatting >
Color Scales > Green-Yellow-Red Color Scale (the first option
­provided).
9. The gradient fill across month and across years provides you with a means
to quickly analyze how months have performed year-over-year and how each
month compares to the months before and after it.
10. Add Sparklines to your highlight table. Sparklines can help add an ­additional
visual to the highlight table to show the movement over time in a line chart
so that you and your audience do not have to rely only on the gradient colors.
a. Similar to your action for Conditional Formatting, select all of the
­numerical data in your PivotTable (not the year labels or month labels).
b. From the Insert tab in the ribbon, select Line (from the Sparkline
­section).
1. Data Range: auto-populated from your selection
2. Location Range: F5:F16 (the cells to the immediate right of your
­PivotTable values)
3. Click OK.
11. Take a screenshot of your highlight table with sparklines (label it 7-3MB).
12. Answer the lab questions, then continue to Part 2.

Tableau | Desktop

1. Open Tableau Desktop and click Connect to Data > To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is; click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT TRAN_DATE, STATE, STORE.STORE, SUM(TRAN_AMT)
AS AMOUNT
FROM TRANSACT

380

ISTUDY
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE TRAN_TYPE = ‘P’
GROUP BY TRAN_DATE, STATE, STORE.STORE
e. Click OK.
3. Click Sheet 1 to create your highlight table. First, rename Sheet 1 to
­Highlight Table.
a. Columns: Tran_Date
1. It should default to Years.
b. Rows: Tran_Date
1. Right-click the Tran_Date pill from the Rows shelf and select Month.
c. Drag Amount to Text (Marks shelf).
d. Select Highlight Table from the Show Me tab.
1. Tableau likely moved MONTH (TRAN_DATE) to the Columns
when you changed the visualization. Just drag it back down to Rows.
2. Tableau defaults the color scheme to Blue gradient. Change this
­gradient to stoplight colors by clicking Color on the Marks shelf >
Edit Colors > from the drop-down select Red-Green-Gold Diverging
and click OK.
e. The gradient fill across month and across years provides you with a
means to quickly analyze how months have performed year-over-year and
how each month compares to the months before and after it.
4. Take a screenshot (label it 7-3A).
5. Create a Cycle Plot. Cycle Plots communicate similar information as
­highlight tables, but in a more visual way that does not rely on the ­gradient
color fill. Cycle Plots make it easy to see month-over-month distribution
and year-over-year distribution at once so that you can spot seasonality and
trends.
a. Right-click the Highlight Table sheet tab and select Duplicate. Rename
the new duplicate sheet Cycle Plot.
b. Adjust the visualization to a lines (discrete) chart from the Show Me tab.
c. Swap the order of the two pills in columns so that MONTH(Tran_Date)
is first and YEAR(tran_date) is second.
d. Right-click the amount Y-axis and select Add reference line.
e. Leave all of the defaults except for Label. Change this default to None
and click OK.
6. Take a screenshot (label it 7-3TB).
7. Answer the lab questions, then continue to Part 2.

Lab 7-3 Part 1 Objective Questions (LO 7-1, 7-5)


OQ1. Which month and year had the highest sales amount?
OQ2. What was the sales amount in July 2016?
OQ3. In which year did March have the lowest sales amount (round to the dollar)?

381

ISTUDY
Lab 7-3 Part 1 Analysis Questions (LO 7-1, 7-5)
AQ1. Why do you think October 2016 has such a comparatively low sales amount?
AQ2. Which trends stand out the most to you? What next steps would be interesting
to take?

Lab 7-3 Part 2 Add Additional Visualizations and Filters


(Tableau), Slicers (Excel)
In this part of the lab we will add two more visualizations to make it easier to interpret the
results: a multi-line chart to show trends for each year and a single line chart that shows
performance for each month. You will also add in two methods for filtering your data: state
and year. Filtering by state is useful to see how different states are performing, and filtering
for year will be particularly useful for excluding 2016 from our results. The data from 2016
skew the results for month performance because the data do not include all of the days in
October or any days in November or December. For the Tableau track, you will also add
your charts to a dashboard to make it easier to interpret all at once.

Microsoft | Excel

1. Create a multi-line chart to show year trends:


a. Ensure your active cell is somewhere within your PivotTable (the
highlight table) from Part 1, and select PivotChart from the PivotTable
Analyze tab in the ribbon.
b. In the Insert Chart window, change the chart type to Line and click OK.
c. Reposition the chart to directly beneath the highlight table.
2. Create a line chart showing the monthly trend:
a. Ensure your active cell is not selecting the PivotChart you just created or
in the PivotTable, and select PivotChart from the Insert tab in the ribbon.
Note: It is important for your cursor to not be selecting the existing
PivotTable or PivotChart because you will be selecting different fields to
create your new PivotChart.
b. Click OK in the Create PivotChart window.
1. Axis (Category): Calendar.Month (Expand More Fields to find Month)
2. Values: Query1.AMOUNT
c. The PivotChart likely defaulted to a bar chart. Ensure that you are
selecting the new PivotChart, and select Change Chart Type from the
PivotChart Design tab in the ribbon.
d. In the Change Chart Type window, select Line and click OK.
e. Reposition the chart to the right of the multi-line chart.
3. Create slicers:
a. Ensure that you have selected one of your PivotCharts or have your
active cell in the PivotTable, and right-click Calendar.Year (in the More
Fields list) and select Add as Slicer.
b. Perform the same action to Query1.STATE.

382

ISTUDY
c. Reposition both slicers so that they are to the right of the PivotTable.
d. For now, the slicers are only connected to the PivotTable or PivotChart
that you created them for. Adjust the setting on each slicer so that they
filter every part of your dashboard.
1. Right-click the heading of each slicer and select Report ­Connections. . .
2. Place a check mark next to each dashboard component and click OK.
3. Adjust the Year slicer so that you can select multiple years by pressing
the multi-select button (looks like a checklist next to the Filter icon).
4. Rename your worksheet by right-clicking the Sheet tab and selecting Rename.
Name it Months Dashboard.
5. Take a screenshot (label it 7-3MC) that includes all of your charts and the
two slicers.
6. Answer the lab questions, then continue to Part 3.

Tableau | Desktop

1. Create a multi-line chart to show year trends:


a. Right-click the Cycle Plot tab and select Duplicate.
b. Name your new tab Multi-Line Chart.
c. Move YEAR(Tran_Date) to Color on the Marks shelf.
2. Create a line chart showing the monthly trend:
a. Right-click the Multi-Line Chart tab and select Duplicate.
b. Name your new tab Single Line Chart.
c. Remove YEAR(Tran_Date) from the Marks shelf.
3. Create filters:
a. Right-click tran_date from the field pane and select Show Filter.
b. If you do not see the filter, close the Show Me pane by clicking Show Me.
c. Right-click the filter and select Apply to Worksheets > All Using This
Data Source.
d. Perform the same action to the State field.
4. Create a dashboard by clicking Dashboard > New and arrange your four
charts (Highlight table, Cycle Plot, Single Line Chart, and Multi-Line Chart)
on the dashboard.
5. Take a screenshot of your monthly dashboard (label it 7-3TC).
6. Answer the lab questions, then continue to Part 3.

Lab 7-3 Part 2 Objective Questions (LO 7-1, 7-5)


OQ1. Which month is consistently the lowest performing in terms of sales?
OQ2. Which year had the best March sales?
OQ3. Which month has the best sales performance in 2014 and 2015?

383

ISTUDY
Lab 7-3 Part 2 Analysis Questions (LO 7-1, 7-5)
AQ1. Which month strikes you as the most interesting and why?
AQ2. What would you recommend that Dillard’s do regarding the past performance
exhibited in the Monthly dashboard?
AQ3. How do you think each different chart (Highlight Table, Sparklines/Cycle Plot,
Multi-Line Chart, or Single Line Chart) provide different value to your analysis?

Lab 7-3 Part 3 Re-create the Monthly Dashboard as a


Weekday Dashboard
In this lab you will duplicate the work you did for the Monthly dashboard by replacing each
instance of Month with Weekday. Weekday dashboards can help analysts determine cycles,
trends, and seasonality across days of the week in their data.

Microsoft | Excel

1. Right-click your Months Dashboard tab and select Move or Copy. . .


2. Place a check mark in the box next to Create a copy and click OK.
3. Rename the new spreadsheet Days of the Week Dashboard.
4. Modify the PivotTable and multi-line chart:
a. Ensure your active cell is in the PivotTable, and remove Month from
Rows in the Field List. Replace it with Day of Week (in the Calendar >
More Fields section).
b. The multi-line chart will automatically update because it is connected to
the same data that you have in the PivotTable.
5. Modify the single line chart:
a. Select the single line chart and remove Month from the Axis
(­Categories) section and replace it with Day of Week.
6. Take a screenshot (label it 7-3MD) of the Days of the Week Dashboard
sheet.
7. Answer the lab questions, then continue to Part 4.

Tableau | Desktop

1. Duplicate each sheet and rename them:


a. Duplicate Highlight Table sheet and rename it Highlight Table - Days.
b. Duplicate Cycle Plot sheet and rename it Cycle Plot - Days.
c. Duplicate Multi-Line Chart and rename it Multi-Line Chart - Days.
d. Duplicate Single Line Chart and rename it Single Line Chart - Days.

384

ISTUDY
2. In each of your new sheets, locate where the MONTH(tran_date) pill is
located and replace it with Weekday by right-clicking MONTH(tran_date)
and selecting More > Weekday.
3. Take a screenshot (label it 7-3TD) of the All tables sheet.
4. When you are finished answering the lab questions, you may close Tableau
and save your file as Lab 7-3 Time Series Dashboard.twb.

Lab 7-3 Part 3 Objective Questions (LO 7-1, 7-5)


OQ1. What is the general trend across weekdays (Sunday to Saturday)?
OQ2. Which year had the best Monday sales?
OQ3. Which week day is consistently the best-selling day across each year?
OQ4. Which mid-week day (non-weekend day) has the best sales?

Lab 7-3 Part 3 Analysis Questions (LO 7-1, 7-5)


AQ1. Which day strikes you as the most interesting and why?
AQ2. What would you recommend that Dillard’s do regarding the past performance
exhibited in the Day of the Week Dashboard?
AQ3. What additional analysis would you include to dig deeper into these results
(adding additional data, filters, hierarchies, etc.)?

Lab 7-3 Part 4 Create KPIs in Excel to Compare Sales


Performance to Prior and Previous Time Periods
The highlight tables and line chart analysis that you completed in the previous parts of the
lab are excellent for detecting trends and seasonality; however, it is not a precise way to
view how performance compared across prior and previous periods according to set goals
and target measures.
Power Pivot in Excel has an excellent feature for creating measures and KPIs to track
performance across time. You will create KPIs for prior period (2015 to 2014, for example)
and for previous period (September to August, for example), and then you will add drill-
down capabilities to assess how individual states and stores performed.
This part of the lab is on the Microsoft track only. In Lab 7-5, you will learn more about
how to work with date parts and measures in Tableau.
In this part of the lab, we will create a descriptive report with KPIs.
KPIs require three decisions:
1. Identify a base performance metric and create a measure for it. Measures can be
implicit or explicit.
• Implicit measures are measures created in a PivotTable—anytime you drag and
drop a field into the values section of the PivotTable, it becomes an implicit mea-
sure. Implicit measures are restricted to the value field settings’ standard aggrega-
tions (SUM, COUNT, MIN, MAX, DISTINCTCOUNT, or AVG). These implicit
measures cannot be used to create KPIs.
• Explicit measures can be created in the Power Pivot Data Model window or in the
Excel main window from the Measure dialog box in the Power Pivot tab on the
Excel ribbon.
2. Identify a target value to compare the measure to.

385

ISTUDY
Rev. Confirming Pages

3. Create a KPI to signal performance of the measure in comparison to the baseline, and
determine the range of values that indicate poor performance, good performance, and
great performance.
To calculate the sum of each year’s sales transactions, we need to create three new measures.

Microsoft | Excel

1. From the Insert tab in the ribbon, create a new PivotTable. This time, do
not place the PivotTable on the same worksheet; select New Worksheet and
click OK.
2. In the Power Pivot tab in the Excel ribbon, create two new measures:
a. If you have closed and re-opened your Excel workbook, you likely will
need to add the Power Pivot add-in back. Do so if you don’t see the
Power Pivot tab in the ribbon.
b. Click Measures > New Measure. . .
1. Table name: Query1 (this is the default)
2. Measure name: Current Year
3. Formula: =SUM([Amount])
4. Category: General
c. Click OK.
d. Click Measures > New Measure. . .
1. Table name: Query1
2. Measure name: Last Year
3. Formula: =CALCULATE(sum([Amount]), SAMEPERIODLASTYE
AR('Calendar'[Date]))
3. In the Power Pivot tab in the Excel ribbon, create a new KPI:
a. Click KPIs > New KPI. . .
1. KPI base field (value): Current Year
2. Measure: Last Year
3. Adjust the status thresholds so that:
a. Anything below 98 percent of last year’s sales (the target) is red.
b. Anything between 98 percent and 102 percent of the target is yellow.
c. Anything above 102 percent of the target is green.
b. Click OK.
4. Notice that the PivotTable has automatically placed your new measures and
your new KPI into the Values section. The Current Year and Last Year val-
ues will make more sense when we pull in the Date Hierarchy. The field
labeled “Current Year Status” is your KPI field, although it has defaulted to
showing a number (1) instead of a stoplight.
a. From your PivotTable field list, remove Current Year Status from the Values.
b. From your PivotTable field list, scroll down until you see the stoplight
Current Year values (they are within the Query1 fields), and place
­Status back in the Values.

386

ISTUDY ric44907_ch07_334-403.indd 386 03/10/23 07:49 AM


Rev. Confirming Pages

5. This KPI will function only with the Date Hierarchy (not with the date parts).
a. Drag Date Hierarchy to the Rows of your PivotTable field list.
6. Take a screenshot (label it 7-3M5) of your PivotTable.
7. Currently, our KPI is comparing years to prior year and months to prior
month (such as September 2016 to September 2015). To compare months to
previous months (September 2016 to August 2016, for example), we will cre-
ate another pair of measures and a new KPI. (Even though the calculation
for current month is the same as the calculation for current year, we have to
create a new measure to use as the KPI’s base; each base measure can have
only one KPI assigned to it.)
8. In the Power Pivot tab in the Excel ribbon, create two new measures and a
new KPI:
a. If you have closed and re-opened your Excel workbook, you likely will
need to add the Power Pivot add-in back. Do so if you don’t see the
­Power Pivot tab in the ribbon.
b. Click Measures > New Measure. . .
1. Table name: Query1 (this is the default)
2. Measure name: Current Month
3. Formula: =SUM([Amount])
4. Category: General
c. Click OK.
d. Click Measures > New Measure. . .
1. Table name: Query1
2. Measure name: Previous Month
3. Formula: =CALCULATE(sum([Amount]), PREVIOUSMONTH
('Calendar'[Date]))
9. In the Power Pivot tab in the Excel ribbon, create a new KPI:
a. Click KPIs > New KPI. . .
1. KPI base field (value): Current Month
2. Measure: Previous Month
3. Adjust the status thresholds so that:
a. Anything below 98 percent of last year’s sales (the target) is red.
b. Anything between 98 percent and 102 percent of the target is
­yellow.
c. Anything above 102 percent of the target is green.
b. Click OK.
10. Add this KPI status to your PivotTable (if it automatically added in, remove
it along with the measures, and add the KPI Status back in to view the stop-
lights instead of 0, 1, and –1).
11. Take a screenshot (label it 7-3MF) of your PivotTable with the new Cur-
rent Month Status KPI included.
This report may be useful at a very high level, but for state-level and store-
level analysis, the level is too high. Next, we will add in two slicers to help
filter the data based on state and store.

387

ISTUDY ric44907_ch07_334-403.indd 387 03/10/23 07:51 AM


Rev. Confirming Pages

12. Create a slicer and resize it:


a. From the PivotTable Analyze tab in the Excel ribbon, click Slicer to
insert an interactive filter.
b. Place a check mark in the boxes next to State and Store to create the
slicers and click OK.
c. Click on Slicer and go to Slicer Tools > Options in the ribbon. Go to the
menu set of “Buttons” and then you will find “Columns.” Increase the
columns to 10.
d. Notice what happens as you select different states: Not only do the data
change to reflect the KPI status for the state that you selected, but the
stores that are associated with that state shift to the top of the store
slicer, making it easier to drill down.
13. We can ease drill-down capabilities even more by creating a hierarchy
between state and store in the Power Pivot Manager:
a. From the Power Pivot tab, click Manage to open the Power Pivot tool.
b. Switch to Diagram View (Home tab, far right).
c. Select both the State and the Store attributes from the Query table, then
right-click one of the attributes and click Create Hierarchy.
d. Rename the Hierarchy to Store and State Hierarchy.
e. Click File > Close to close the Power Pivot tool. The PivotTable field list
will refresh automatically.
14. In your PivotTable field list, drag and drop the Store and State Hierarchy to
the Rows (above the Date Hierarchy).
15. Finally, add the following measures to show the precise monthly and annual
changes for comparison with the KPI:
a. Monthly Change: =FORMAT(IFERROR([Current Month]/[Previous
Month],""),"Percent")
b. Annual Change: FORMAT(IFERROR([Current Year]/[Last
Year],""),"Percent")
16. Take a screenshot (label it 7-3MG) of your PivotTable with the new hier-
archy included.
17. When you are finished answering the lab questions you may close Excel.
Save your file as Lab 7-3 Time Series Dashboard.xlsx.

Lab 7-3 Part 4 Objective Questions (LO 7-2, 7-3)


OQ1. What is the Current Month Status for December 2015 in Georgia (GA)?
OQ2. Remove all filtering (it may be easier to temporarily remove the Store and State
Hierarchy from the field list to answer this question). What is the overall Cur-
rent Month Status for September 2016?

388

ISTUDY ric44907_ch07_334-403.indd 388 03/10/23 07:54 AM


OQ3. Remove all filtering (it may be easier to temporarily remove the Store and State
Hierarchy from the field list to answer this question). What is the overall Cur-
rent Year Status for 2015?

Lab 7-3 Part 4 Analysis Questions (LO 7-2, 7-3)


AQ1. How does the ability to drill down into the state and store data give manage-
ment critical information and help them identify issues that are occurring or
opportunities that might be available?
AQ2. We know that the Dillard’s database includes much more information than
what we loaded into Excel through the SQL query, including product hierar-
chy data. How would including product category or SKU data help Dillard’s
plan future promotions or future purchases? What would you recommend
that Dillard’s do regarding the past performance exhibited in the Day of
Week Dashboard?
AQ3. How does having KPI analysis in addition to time series analysis (what you cre-
ated in Parts 1 through 3 of this lab) help make the analysis more robust?

Lab 7-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 7-4  omprehensive Case: Comparing Results to a Prior


C
Period—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As a Dillard’s division manager, you are curious about how different
stores are performing in your region compared to last year. Management has specified
a goal for stores to exceed an average transaction amount per location so they can
evaluate where additional resources may be deployed to get more customers to stores
or to consolidate locations. Our question for this lab is whether average sales transac-
tions from April 2016 exceed management’s goal and whether they are different (bet-
ter, worse, approximately the same) than the average sales from the same time period
in 2015.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.

Lab 7-4 Example Output


By the end of this lab, you will create a dashboard to explore sales data. While your results
will include different data values, your work should look similar to this:

389

ISTUDY
Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 7-4M Example KPI Analysis in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 7-4T Example KPI Analysis in Tableau Desktop

Lab 7-4 Part 1 Setup Parameters and Calculated Fields


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-4 [Your name] [Your email address].docx.
In this first part, you will do some preliminary work to set up parameter values that will
work as benchmarks and calculated fields to help filter and compute specific values.

390

ISTUDY
Rev. Confirming Pages

Microsoft | Power BI Desktop

1. Create a new workbook in Power BI Desktop and connect to your data:


a. Go to Home > Get Data > SQL Server.
b. Enter the following and click OK:
1. Server: essql1.walton.uark.edu
2. Database: WCOB_DILLARDS
3. Data Connectivity Mode: DirectQuery
c. Check the boxes next to TRANSACT and STORE and click Load.
Ensure that the tables join correctly by clicking Modeling > Manage
relationships.
2. Note: For this visualization, we will be making date comparisons. When
Power BI Desktop loads the Dillard’s data, it will not automatically detect
the date hierarchy for the TRAN_DATE attribute. Analysts commonly cre-
ate separate date tables to solve inconsistencies with the date. You should do
that now:
a. In the ribbon, click Modeling > New Table.
b. Enter the following formula and press Enter: DATE_TABLE =
CALENDAR("1/1/2014", "12/31/2016"). You will see a new 
DATE_TABLE in your list complete with date hierarchy.
c. Now connect the date table to your data. Click the Model button in the
toolbar on the left and drag the DATE_TABLE.Date on top of 
TRANSACT.TRAN_DATE.
d. Verify the relationship (it’s okay if the date formats don’t match exactly)
and click OK.
e. Click the Report button to return to your page.
3. Create Hierarchies to enable drilling down on your data by location and
department:
a. Drag STORE.STORE onto STORE.STATE to create a new hierarchy
and rename it Location (right-click > Rename).
4. Take a screenshot (label it 7-4MA) of your field list. Note that your page
will be blank at this time.
5. Create a slicer to filter your target year and month and a measure for your
global KPI target for average transaction amount.
a. To dynamically filter years:
1. Click on a blank part of your page and add a Slicer.
2. Field: DATE_TABLE > Date > Date Hierarchy > Year
3. In the slicer, click the drop-down arrow in the top-right corner and
choose List.
4. Check the box next to 2016.
5. Move the slicer to the top-right corner of the page and shrink it, so it
is just bigger than the data list.

391

ISTUDY ric44907_ch07_334-403.indd 391 03/10/23 08:04 AM


Rev. Confirming Pages

b. To dynamically filter months:


1. Click on a blank part of your page and add a Slicer.
2. Field: DATE_TABLE > Date > Date Hierarchy > Month
3. In the slicer, click the drop-down arrow in the top-right corner and
choose List.
4. Check the box next to April.
5. Move the slicer to the top-right corner of the page just below the year
slicer and shrink it so it is just bigger than the data list.
c. Set a global KPI target for average transaction amount. Management has
determined that they would like to see stores with at least a $22 transac-
tion on average:
1. Click Modeling > New Measure.
2. Enter the following formula and press Enter:
a. KPI Target Avg Tran Amount = 22
6. Create calculated fields (click Modeling > New Measure) to select values for
the current period, previous period, and determine whether the values met
the goal or not.
a. TRAN_AMT_PREV_YEAR = CALCULATE(AVERAGE(TRANSACT
[TRAN_AMT]), DATEADD(DATE_TABLE[Date], -1, YEAR))
Note: This formula pulls the date and takes away 1 year, then averages
all of the transaction amounts together.
b. Current vs Target Avg Tran Amount =
IF(AVERAGE(TRANSACT[TRAN_AMT])>=[KPI Target Avg Tran
Amount],"#4e79a7","#f28e2b")
Note: This formula compares the transaction amount to the KPI target. If it
is higher, the data will appear blue. If it is lower, it will appear orange.
c. Current vs Prev Year Avg Tran Amount = IF(AVERAGE([TRAN_AMT])
> [TRAN_AMT_PREV_YEAR], "#4e79a7","#f28e2b")
Note: This formula compares the transaction amount to the previous
year amount. If it is higher, the data will appear blue. If it is lower, it will
appear orange.
7. Take a screenshot (label it 7-4MB) of your field list. Note that your page
will contain only the two slicers at this time.
8. After you answer the lab questions, save your work as Lab 7-4 Dillard’s Prior
Period.pbix and continue to Part 2.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop and connect to your data:


a. Go to Connect > To a Server > Microsoft SQL Server.
b. Enter the following and click Sign In:
1. Server: essql1.walton.uark.edu
2. Database: WCOB_DILLARDS

392

ISTUDY ric44907_ch07_334-403.indd 392 03/10/23 08:06 AM


c. Add the TRANSACT and STORE tables to the Data Source page.
Ensure that the tables join correctly (you can check the appropriate
relationships in Appendix J).
2. Click Sheet 1.
3. Create Hierarchies to enable drilling down on your data by location and
department:
a. Drag Store onto State to create a new hierarchy named Location.
b. Take a screenshot (label it 7-4TA).
4. Create Parameters (at the top of the Data tab, click the down-arrow and
choose Create Parameter) to set a target year and global KPI target for aver-
age transaction amount.
a. Set a target year:
1. Name: Select a Year
2. Data type: Integer
3. Allowable values: List
4. Values: 2014, 2015, 2016
5. Current value: 2016
b. Set a global KPI target for average transaction amount:
1. Name: KPI Target Avg Tran Amount
2. Data type: Integer
3. Current value: 22
5. Create calculated fields (click Analysis > Create Calculated Field) to select
values for the current period, previous period, and determine whether the
values met the goal or not.
a. Tran Amt Current Year:
IF DATEPART(‘year’, [Tran Date]) = ([Select a Year]) THEN[Tran
Amt] END
b. Tran Amt Previous Year: IF DATEPART(‘year’, [Tran Date]) = ([Select
a Year]-1) THEN[Tran Amt] END
c. KPI Avg Current > Target: AVG([Tran Amt Current Year])>[KPI Target
Avg Tran Amount]
d. KPI Avg Current > Previous: AVG([Tran Amt Current Year]) >
AVG([Tran Amt Previous Year])
6. Take a screenshot (label it 7-4TB).
7. After you answer the lab questions, save your work as Lab 7-4 Dillard’s Prior
Period.twb and continue to Part 2.

Lab 7-4 Part 1 Objective Questions (LO 7-2, 7-3)


OQ1. What is the current year you are evaluating?
OQ2. What is the KPI target value for average sales?
OQ3. When comparing the current value to the previous value or the target KPI, what
is the expected output of the calculation?

393

ISTUDY
Rev. Confirming Pages

Lab 7-4 Part 1 Analysis Questions (LO 7-2, 7-3)


AQ1. What is the purpose of creating hierarchies of date fields?
AQ2. What is the purpose of a parameter?
AQ3. Why do you need to create measures in Power BI and/or calculated fields in
Tableau?

Lab 7-4 Part 2 Compare Store Results to Global KPI


Target
Now that that you have the prerequisite fields and calculations in place, you’re ready to
create your charts.

Microsoft | Power BI Desktop

1. Open your Lab 7-4 Dillard’s Prior Period.pbix from Part 1 if you closed it and
go to Page 1.
2. Name your page: Prior Period.
3. Create a Clustered Bar Chart showing store performance compared to the
KPI target average sales and resize it so it fills the left side of your page, then
drag the fields to their corresponding place:
a. Y-axis: Location
b. X-axis: TRANSACT.TRAN_AMOUNT > Average
c. Tooltips: KPI Target Avg Tran Amount
d. Click the Format visual (paintbrush) icon and adjust the styles:
1. Visual > Y-axis > Title: Location
2. Visual > X-axis > Title: Average Transaction Amount
3. Visual > Bar > Color > Conditional formatting (fx button) or click
the vertical three dots next to Default color. Choose the following
and click OK:
a. Format style: Field value
b. What field should we base this on?: Current vs Target Avg Tran
Amount
4. General > Title > Text: KPI Average Tran Amount by Location
e. Take a screenshot (label it 7-4MC).
4. Now adjust your sheet to show filters and drill down into values:
a. In the right-hand corner of the clustered bar chart you just made, click
Expand all down one level in the hierarchy (forked arrow icon) to show
the store.
b. In the visualization click More Options (three dots) and choose Sort By
> STATE STORE. Then choose Sort Ascending.
5. Take a screenshot (label it 7-4MD) of the page.
6. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-4 Dillard’s Prior Period.pbix and continue to
Part 3.

394

ISTUDY ric44907_ch07_334-403.indd 394 02/20/23 01:39 PM


Tableau | Desktop

1. Open your Lab 7-4 Dillard’s Prior Period.twb from Part 1 if you closed it and
go to Sheet 1.
2. Name your sheet: KPI Average Amount.
3. Create a bullet chart showing store performance compared to the KPI target
average sales. Note: A bullet chart shows the actual results as a bar and the
benchmark as a reference line.
a. Drag your Location hierarchy to the Rows shelf.
b. Drag Tran Amt Current Year to the Columns shelf. Change it to Average,
and sort your chart in descending order by average transaction amount.
c. Click the Analytics tab and drag Reference Line to your Table and
click OK.
1. Scope: Entire Table
2. Value: KPI Target Avg Tran Amount
d. Return to the Data tab and drag KPI Avg Current > Target to the Color
button in the Marks pane.
e. Take a screenshot (label it 7-4TC).
4. Now adjust your sheet to show filters and drill down into values:
a. Right-click the Select a Year parameter and click Show Parameter.
b. Drag Tran Date to the Filters pane and choose Months, then click Next.
Choose Select from List and check all months, then click OK.
c. In the Rows shelf, click the + next to State to expand the values to show
the average transaction amount by store.
d. Edit the colors in the panel on the right:
1. True: Blue
2. False: Orange
3. Null: Red
5. Take a screenshot (label it 7-4TD) of your expanded sheet.
6. After you answer the lab questions, save your work as Lab 7-4 Dillard’s Prior
Period.twb and continue to Part 3.

Lab 7-4 Part 2 Objective Questions (LO 7-2, 7-3)


OQ1. Which state(s) exceeded the KPI target for average transaction amount in 2016?
OQ2. How many stores exceeded the KPI target in 2016? Hint: Filter out stores that
don’t meet the threshold.
OQ3. How many stores exceeded the KPI target in 2015?
OQ4. How many stores exceeded the KPI target in March 2015?

Lab 7-4 Part 2 Analysis Questions (LO7-2, 7-3)


AQ1. What might explain the change in the number of stores that met the KPI target
from 2015 to 2016?
AQ2. If you were the division manager, what might you do with this KPI information?

395

ISTUDY
Rev. Confirming Pages

AQ3. What do you notice about the automatic color selections for the graph? What
colors would you choose to make the chart more readable?

Lab 7-4 Part 3 Compare Current Year Results to


Previous Year Results
Comparing to a global goal may not be granular enough to gauge individual store perfor-
mance. Instead of a global goal, let’s see how individual stores’ actual results compare to
results from a previous period.

Microsoft | Power BI Desktop

1. Open your Lab 7-4 Dillard’s KPI Part 1.pbix from Part 2 if you closed it.
2. Click the blank area on your Prior Period page and create a new Clustered
Bar Chart showing store performance compared to the previous year perfor-
mance. Resize it so it fills the right side of the page between the first chart
and the slicers:
a. Y-axis: Location
b. X-axis: TRANSACT.TRAN_AMOUNT > Average
c. TRANSACT.TRAN_AMOUNT_PREV_YEAR
d. Click the Format visual (paintbrush) icon and adjust the styles:
1. Visual > Y-axis > Title: Location
2. Visual > X-axis > Title: Average Transaction Amount
3. General > Title > Text: Average Tran Amount by Location Prior Period
e. Take a screenshot (label it 7-4ME).
3. Now adjust your sheet to show filters and drill down into values:
a. In the visualization, click Expand all down one level in the hierarchy
(forked arrow icon) to show the store.
b. In the visualization click More Options (three dots) and choose Sort By
> STATE STORE. Then choose Sort Ascending.
4. Take a screenshot (label it 7-4MF) of the page.
5. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-4 Dillard’s Prior Period.pbix.

Tableau | Desktop

1. Open your Lab 7-4 Dillard’s KPI Part 1.twb from Part 2 if you closed it and
create a new sheet. Go to Worksheet > New Worksheet.
2. Name your sheet: PP Average Amount.

396

ISTUDY ric44907_ch07_334-403.indd 396 02/11/23 01:59 PM


3. Create a chart showing store performance compared to the previous year
performance:
a. Drag your Location hierarchy to the Rows shelf.
b. Drag Tran Amount Current Year to the Columns shelf. Change it to
Average, and sort your chart in descending order by average transaction
amount.
c. Drag Tran Amount Previous Year to the Details button in the Marks
pane. Change it to Average.
d. Return to the Data tab and drag KPI Avg Current > Previous to the
Color button in the Marks pane.
e. Click on Color on the Marks pane and change the bars to Opacity =
100% and border to black.
f. Click the Analytics tab and drag Distribution Band to your chart and
click OK.
1. Scope: Per Cell
2. Value:
a. Percentages: 60, 80, 100
b. Percent of: AVG(Tran Amount Previous Year)
3. Check: Fill Below
4. Fill: Stoplight
g. Take a screenshot (label it 7-4TE).
4. Now adjust your sheet to show filters and drill down into values:
a. Right-click the Select a Year parameter and click Show Parameter.
b. Drag Tran Date to the Filters pane and choose Months. Choose Select
from List and check all months, then click OK.
c. In the Rows shelf, click the + next to State to expand the values to show
the average transaction amount by store.
d. Edit the colors for the KPI Avg Current > Previous in the panel on the
right:
1. True: Blue
2. False: Orange
3. Null: Red
5. Take a screenshot (label it 7-4TF) of your expanded sheet.
6. After you answer the lab questions, save your work as Lab 7-4 Dillard’s Prior
Period.twb and exit Tableau.

Lab 7-4 Part 3 Objective Questions (LO 7-2, 7-3)


OQ1. How many stores failed to exceed the average transaction amount from the
­previous year?
OQ2. How many stores are missing benchmark data from 2016?

397

ISTUDY
Lab 7-4 Part 3 Analysis Questions (LO 7-2, 7-3)
AQ1. In Tableau, the benchmark colors show below 60 percent as red, below
80 ­percent as yellow, and below 100 percent as green. Why might a manager
consider 80–100 percent of the previous year’s results acceptable?
AQ2. What additional available measures or calculated fields might you use to evalu-
ate store performance?

Lab 7-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 7-5 C
 omprehensive Case: Advanced Performance
Models—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Management at Dillard’s corporate office is looking at new ways to
incentivize managers at the various store locations both to boost low-performing stores and
to reward high-performing stores. The retail manager has asked you to evaluate store perfor-
mance over a two-week period and identify those stores.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
­Desktop (waltonlab.uark.edu). See your instructor for login credentials.

Lab 7-5 Example Output


By the end of this lab, you will create a dashboard to explore sales data. While your results
will include different data values, your work should look similar to this:

Microsoft | Power BI Desktop

Microsoft Power BI Desktop


LAB 7-5M Example Advanced Models Dashboard in Microsoft Power BI Desktop

398

ISTUDY
Rev. Confirming Pages

Tableau | Desktop

Tableau Software, Inc. All rights reserved.


LAB 7-5T Example Advanced Models Dashboard in Tableau Desktop

Lab 7-5 Part 1 Cluster Analysis of High-Volume Stores


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 7-5 [Your name] [Your email address].docx.
In this analysis, we want to see which stores are doing well—in other words, stores with a
high volume of transactions and high average transaction price. Cluster analysis will group
the stores that share similar performance.

Microsoft | Power BI Desktop

1. Load the following data from SQL Server:


a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
c. Data connectivity mode: Import
d. Advanced options:
1. SQL Statement: SELECT * FROM TRANSACT JOIN STORE ON
STORE.STORE = TRANSACT.STORE WHERE TRAN_DATE
BETWEEN '20150301' AND '20150314'
2. Rename your Page 1 tab to Store Clusters.
a. Visualization: Scatter Chart
b. In the Fields pane:
1. X-axis: TRANSACT.TRANSACTION ID > Count
2. Y-axis: TRANSACT.TRAN_AMT > Average
3. Values: TRANSACT.STORE (Note: Change the data type to Text.)

399

ISTUDY ric44907_ch07_334-403.indd 399 03/10/23 08:08 AM


c. You’ll notice an outlier in the top-right corner. In this case, it is the
online division of Dillard’s. Because we’re evaluating brick-and-mortar
stores, we want to exclude this one.
d. Right-click on the outlier (Store 698) and click Exclude.
e. To create a cluster, click the three dots in the top- or bottom-right corner
of the scatter plot visualization and choose Automatically Find Clusters.
f. Enter the following parameter and click OK.
1. Number of clusters: 12
g. Right-click on your scatter plot and choose Show data.
3. Take a screenshot (label it 7-5MA).
4. Answer the questions and continue to Part 2.

Tableau | Desktop

1. Load the following data from SQL Server (see Lab 1-5 for instructions):
a. Database: WCOB_DILLARDS
b. Tables: TRANSACT, STORE
c. Date range: 3/1/2015 to 3/14/2015
2. Create a new worksheet called Store Clusters:
a. Columns: TRANSACT.Transaction ID > Measure > Count
b. Rows: TRANSACT.Tran Amt > Measure > Average
c. Marks:
1. TRANSACT.Store > Color
d. Let the query run at this point.
e. You’ll notice an outlier in the top-right corner. In this case, it is the
online division of Dillard’s. Because we’re evaluating brick-and-mortar
stores, we want to exclude this one.
f. Right-click on the outlier (Store 698) and click Exclude.
g. To create clusters, click the Analytics tab and drag Cluster to the scatter
plot.
1. Number of clusters: 12
h. Right-click your scatter plot and choose View Data. . . Then drag the
data window down so you can see both the scatter plot and the data.
3. Take a screenshot (label it 7-5TA).
4. Answer the questions and continue to Part 2.

Lab 7-5 Part 1 Objective Questions (LO 7-2, LO 7-4)


OQ1. If management is interested in high-performing stores, where would they locate
that cluster on this graph?
OQ2. What is the store number of the highest-performing store by volume?

400

ISTUDY
Rev. Confirming Pages

OQ3. If management is interested in low-performing stores, where would they locate


the cluster on this graph?
OQ4. What is the store number of the lowest-performing store?
OQ5. What would happen to the cluster of high-performing stores if you had chosen
fewer clusters to analyze?

Lab 7-5 Part 1 Analysis Questions (LO 7-2, LO 7-4)


AQ1. Look at the cluster of high-performing stores. What do you think are some of
the drivers for their good performance?
AQ2. Look at the cluster of low-performing stores. What do you think are some of the
drivers for their poor performance?

Lab 7-5 Part 2 Stacked Bar Chart of Daily Store


Performance
Now we should evaluate the high-performing stores and evaluate their daily transaction amounts.

Microsoft | Power BI Desktop

1. Create a new visual called Sales By Date:


a. Visualization: Stacked Bar Chart
b. X-axis: TRANSACT.TRAN_AMT
c. Y-axis: TRANSACT.STORE
d. Legend: TRANSACT.TRAN_DATE > TRANSACT.TRAN_DATE > Date
Hierarchy > Day > Day
e. Filters: STORE
1. Filter type: Basic Filtering
2. Values: Select the high-performing stores from the cluster you identi-
fied in Part 2.
f. Click More Options (three dots) on your chart and choose Sort by
TRAN_AMT and Sort descending.
2. Take a screenshot (label it 7-5MB).
3. Answer the questions and continue to Part 3.

Tableau | Desktop

1. Create a new sheet called Sales By Date:


a. Columns: TRANSACT.Tran Amt > Measure > Sum
b. Rows: TRANSACT.Store
c. Marks:

401

ISTUDY ric44907_ch07_334-403.indd 401 02/11/23 02:03 PM


Rev. Confirming Pages

1. Type: Bar
2. TRANSACT.Tran Date > Color > Day
a. Discrete
b. Day <- There are two day options in the drop-down. Choose the
top one without a year.
3. TRANSACT.Tran Date > Label > Day
a. Sort: Descending
d. Let the query run at this point.
e. Now filter your results. Right-click outside the work area and click
­Filters > Store.
f. Now let’s narrow in on the high-performing stores we identified in our
cluster analysis.
1. Uncheck All in the Store filter list.
2. Check the stores from the cluster you identified in Part 2.
g. Hover over Tran Amt in the chart and click the Sort Descending icon.
2. Take a screenshot (label it 7-5TB).
3. Answer the questions and continue to Part 3.

Lab 7-5 Part 2 Objective Questions (LO 7-2, LO 7-4)


OQ1. Which two days appear to have the highest transaction amounts?
OQ2. Which two days appear to have the lowest transaction amounts?
OQ3. Look up these dates on a calendar. What do you notice about the days with
higher transaction amounts?
OQ4. Given this pattern, would you expect the next day after this analysis to be higher
or lower than the last day?

Lab 7-5 Part 2 Analysis Questions (LO 7-2, LO 7-4)


AQ1. What additional data would you want to gather to help explain why sales
­transactions decrease on the days with the lowest sales?
AQ2. How would you collect those additional data?

Lab 7-5 Part 3 Tree Map of Sales by Location


Now we’ll look at sales by state and city.

Microsoft | Power BI Desktop

1. Create a new visual called Sales By Location:


a. Visualization: Treemap
b. Category: STORE.STATE
c. Values: TRANSACT.TRAN_AMT

402

ISTUDY ric44907_ch07_334-403.indd 402 02/11/23 02:04 PM


Rev. Confirming Pages

2. To drill down to subcategories, we use Details in Power BI Desktop.


a. Details: STORE.CITY
3. Take a screenshot (label it 7-5MC).
4. When you are finished answering the lab questions, you may close Power BI
Desktop. Save your file as Lab 7-5 Dillard’s Advanced Models.pbix.

Tableau | Desktop

1. No columns or rows
2. Marks:
A. TRANSACT.Tran Amt > Size > SUM
B. TRANSACT.Tran Amt > Color > SUM
C. STORE.State > Label
3. To drill down to subcategories, we can create a hierarchy in Tableau.
a. In the attributes list, drag STORE.City onto STORE.State to create a
hierarchy and click OK.
b. Now in the Marks list, click the + next to State to show the cities in each
state.
4. Create a new dashboard and drag the three sheets onto the page.
5. Take a screenshot (label it 7-5TC).
6. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 7-5 Dillard’s Advanced Models.twb.

Lab 7-5 Part 3 Objective Questions (LO 7-2, LO 7-4)


OQ1. Which three states had the top sales volume for the two weeks of analysis?
OQ2. Which three states had the lowest sales volume for the two weeks of analysis?
OQ3. Which city has the highest volume of sales for the two weeks of analysis?
OQ4. If you exclude the city from the previous question from analysis, what sales rank
does Arkansas fall to?

Lab 7-5 Part 3 Analysis Questions (LO 7-2, LO 7-4)


AQ1. Compare and contrast the cluster model produced by Tableau Desktop with the
one produced by Power BI Desktop. What do you notice about the clusters?
AQ2. Compare and contrast the stacked bar chart produced by Tableau Desktop with
the one produced by Power BI Desktop. Which one is easier to interact with?
AQ3. Compare and contrast the treemap produced by Tableau Desktop with the one
produced by Power BI Desktop. Which one is easier to interpret?

Lab 7-5 Submit Your Screenshot Lab Document


Verify that you have captured all of your required screenshots and have answered any ques-
tions your instructor has assigned, then upload your screenshot lab document to Connect
or the location indicated by your instructor.

403

ISTUDY ric44907_ch07_334-403.indd 403 06/22/22 10:13 AM


Chapter 8
Financial Statement Analytics

A Look at This Chapter


In this chapter, we focus on how to access and analyze financial statement data. We highlight the use of XBRL to
quickly and efficiently gain computer access to financial statement data while addressing the data quality and con-
sistency issues of XBRL data in the United States. Next, we discuss how ratios are used to analyze financial perfor-
mance. We also discuss the use of sparklines and other visualization tools to help users identify trends and points of
interest in the data. Finally, we discuss the use of text mining to analyze the sentiment in financial reporting data.

A Look Back
Chapter 7 focused on generating and evaluating key performance metrics that are used primarily in managerial
accounting. By measuring past performance and comparing it to targeted goals, we are able to assess how well a com-
pany is working toward a goal. Also, we can determine required adjustments to how decisions are made or how busi-
ness processes are run, if any.

A Look Ahead
In Chapter 9, we highlight the use of Data Analytics for the tax function. We also discuss the application of Data
Analytics to tax questions and look at tax data sources and how they may differ depending on the tax user (a tax
department, an accounting firm, or a regulatory body) and tax needs. We also investigate how visualizations are
useful components of tax analytics. Finally, we consider how data analysis might be used to assist in tax planning.

404

ISTUDY
Sometimes the future is now. The StockSnips app uses sentiment analysis,
machine learning, and artificial intelligence to aggregate and analyze news
related to publicly traded companies on Nasdaq and the New York Stock
Exchange to “gain stock insights and track a company’s financial and business
operations.” The use of Data Analytics helps classify the news to help predict
revenue, earnings, and cash flows, which are in turn used to predict the stock
performance. What will Data Analytics do next?

S Narayan/Dinodia Photo/AGE Fotostock

OBJECTIVES
After reading this chapter, you should be able to:

LO 8-1 Understand different types of financial statement analysis that are


performed.
LO 8-2 Explain how to create and read visualizations of financial statement data.
LO 8-3 Describe the value of text mining and sentiment analysis of financial
reporting.
LO 8-4 Describe how XBRL tags financial reporting data and how it facilitates
financial statement analysis.

405

ISTUDY
406 Chapter 8 Financial Statement Analytics

LO 8-1 FINANCIAL STATEMENT ANALYSIS


Understand Financial statement analysis is used by investors, analysts, auditors, and other interested
different types of stakeholders to review and evaluate a company’s financial statements and financial per-
financial statement formance. Such analysis allows the stakeholder to gain an understanding of the financial
analysis that are health of the company to allow more insightful and effective decision making.
performed. Exhibit 8-1 provides a summary of the potential financial statement analysis questions
and the Data Analytics techniques used by analytics type. Most financial statement users
will perform descriptive and diagnostic analytics to understand the firm and identify trends
and relationships among different accounts. Predictive analytics provide insight into the
future and help predict future sales, cash flows, and earnings. Prescriptive analytics are used
to help optimize decisions based on potential constraints, such as determining the intrinsic
value of a stock and providing data-driven insights into the decision to buy/sell/hold the
common stock of the company.

EXHIBIT 8-1
Financial Statement Analysis Questions and Data Analytics Techniques by Analytics Type

Potential Financial Statement Analysis


Analytics Type Questions Addressed Data Analytics Techniques Used

Descriptive—summarize activity or master What is the firm’s debt relative to its assets in Summary statistics (Sums, Totals, Averages,
data based on certain attributes to address the past year? Medians, Bar Charts, Histograms, etc.)
questions of this type: What is the company’s cash flow from Ratio analysis
What happened? What is happening? operating activities for each of the past three Vertical, horizontal analysis
years?
What are the profit margins during the
current quarter?
What are the component parts of return on
equity?
Diagnostic—detect correlations and How leveraged is the company compared to Performance comparisons to past,
patterns of interest and compare them to a the industry or its competitors? competitor, industry, stock market, overall
benchmark to address questions of this type: Are the R&D investments last quarter economy
Why did it happen? What are the reasons associated with increased sales and earnings Drill-down analytics to determine relations/
for past results? Can we explain why it this quarter? patterns/linkages between variables
happened? Why did the collectability of receivables Regression and correlation analysis
(amount our customers owe to the company)
fall in the current quarter as compared to the
prior quarters?
Predictive—identify common attributes or What are the expected sales, earnings, and Sales, earnings, and cash-flow forecasting
patterns that may be used to forecast similar cash flows over the next five years? using:
activity to address the following questions: How much would the company have earned — Time series
Will it happen in the future? What is the if there had not been a business interruption — Analyst forecasts
probability something will happen? Is it (due to fire or natural disaster)? — Competitor and industry performance
forecastable?
— Macroeconomic forecasts
Drill-down analytics to determine relations/
patterns/linkages between variables
Regression and correlation analysis
Prescriptive—recommend action based on What is the intrinsic value of the stock based Cash-flow analysis
previously observed actions to address on forecasted financial performance and Calculation of net present value of future
questions of this type: uncertainty? cash flows
What should we do based on what we expect How sensitive is the stock’s intrinsic value to Sensitivity analysis
will happen? How do we optimize our assumptions and estimates made?
performance based on potential constraints?

ISTUDY
Chapter 8 Financial Statement Analytics 407

Descriptive Financial Analytics


The primary objective of descriptive financial analytics is to summarize past performance
and address the question of what has happened. To do so, analysts may evaluate financial
statements by considering a series of ratio analyses to learn about the composition of cer-
tain accounts or to identify indicators of risk.
Ratio analysis is a tool used to evaluate relationships among different financial statement
items to help understand a company’s financial and operating performance. It tells us how
much of one account we get for each dollar of another. For example, the gross profit ratio
(gross profit/revenue) tells us what is left after deducting the cost of sales.
Financial ratio analysis is a key tool used by accounting, auditing, and finance profes-
sionals to assess the financial health of a business organization, to assess the reasonable-
ness of (past) reported financial results, and ultimately to help forecast future performance.
Analytical procedures, including ratio analysis, are also used in the audit area and are rec-
ognized as an essential component of both planning an audit and carrying out substantive
testing. AS 2305 states:

A basic premise underlying the application of analytical procedures is that plausible


relationships among data may reasonably be expected to exist and continue in the
absence of known conditions to the contrary.1

Knowledge of financial statement analysis using ratios is a component of several profes-


sional certifications, including the CPA (certified public accountant), CMA (certified man-
agement accountant), and CFA (chartered financial analyst) certifications, so it is clearly
critical for any accountant.

Vertical and Horizontal Analysis


One way analysts use ratio analysis is by preparing common size financial statements, simi-
lar to the one shown in Exhibit 8-2. Common size financial statements are a presentation of
the financial statements as a percentage of a base number to compare financial statements
of different-size companies. With common size financial statements we can perform a
vertical analysis, which is an analysis that shows the proportional value of accounts to a pri-
mary account, such as Revenue (total) on the income statement. In the following example,
we divide operating income by revenue to show that Apple earns about $0.21 for every dol-
lar in sales and Microsoft earns about $0.31 for every dollar in sales in 2020. This is their
profit margin. On the balance sheet we would use vertical analysis to identify the proportion
of assets, for example, dividing accounts receivable by total assets.

1
PCAOB, AS 2305, https://pcaobus.org/Standards/Auditing/Pages/AS2305.aspx.

ISTUDY
408 Chapter 8 Financial Statement Analytics

EXHIBIT 8-2
Vertical Analysis of a
Common Size Financial
Statement

Microsoft Excel

Lab Connection
Lab 8-1 and Lab 8-2 have you perform vertical and horizontal analyses.

Ratio Analysis
For other indicators of financial health, there are four main types of ratios: liquidity, activity,
solvency (or financing), and profitability. In practice, these ratios may vary slightly depending
on which accounts the user decides to include or exclude.
Liquidity is the ability to satisfy the company’s short-term obligations using assets that
can be most readily converted into cash. Liquidity ratios help determine a company’s ability
to meet short-term obligations. Here are some common liquidity ratios:

Current ratio = Current assets/Current liabilities


Quick (acid test) ratio = (Current assets – Inventory)/Current liabilities
Working capital = Current assets – Current liabilities

Activity ratios are a computation of a firm’s operating efficiency. Operating efficiency is


often captured using turnover ratios, which reflect the flow of assets. Receivables, inventory,
and total asset turnover are all examples of activity ratios. Note that when you compare

ISTUDY
Chapter 8 Financial Statement Analytics 409

income statement (duration) accounts with balance sheet (point in time) accounts, you
need to average the balance sheet accounts to match the period. Also, for turnover ratios,
analysts may use 365 days or round down to 360 days depending on preference.

Asset turnover ratio = Net sales/Average total assets


Receivable turnover ratio = Net sales/Average net accounts receivable
Average collection period ratio = 365/Receivables turnover
Inventory turnover ratio = Cost of goods sold/Average inventory
Average days in inventory ratio = 365/Inventory turnover

We use solvency (sometimes called financing) ratios to help assess a company’s ability
to pay its debts and stay in business. In other words, we assess the company’s financial
risk—that is, the risk resulting from a company’s choice of financing the business using debt
or equity. Debt-to-equity, long-term debt-to-equity, and times interest earned ratios are also
useful in assessing the level of solvency.

Debt-to-equity ratio = Total liabilities/Shareholders’ equity


Times interest earned ratio = Income before interest and taxes/Interest expense

Profitability ratios are a common calculation when assessing a company. They are used
to provide information on the profitability of a company and its prospects for the future.

Profit margin on sales ratio = Net income/Net sales


Return on assets ratio = Net income/Average total assets
Return on equity ratio = Net income/Average shareholders’ equity

Profitability ratios are commonly associated with the DuPont ratio. The DuPont ratio
was developed by the DuPont Corporation to measure performance as a decomposition of
the return on equity ratio in this way:

​Return on equity (ROE)


Profit margin × Asset turnover × Equity multiplier
= ​______________________________________________________________________________
       
          ​
​(​Net income/Net sales) × (Net sales/Average total assets) × (Average total assets/Average shareholders’ equity​)​

The DuPont ratio decomposes return on equity into three different types of ratios: prof-
itability (profit margin), activity (asset turnover), and solvency (equity multiplier) ratios.
This allows a more complete evaluation of performance.

Lab Connection
Lab 8-3 has you analyze financial ratios.

ISTUDY
410 Chapter 8 Financial Statement Analytics

Diagnostic Financial Analytics


Diagnostic financial analysis works to explain why something has occurred, typically by com-
paring a firm’s performance to a benchmark like a competitor, its industry, or another portfolio
of interest. One example of such a benchmark would be to calculate the mean rate of return or
working capital within an industry or set of competitors and compare to the focal company.
Is a debt-to-equity ratio value of 2.0 good or bad? Does an asset turnover ratio value of
60 days indicate a problem with obsolete inventory? Without a benchmark, these ratios give
us nothing more than a data point. We need to compare these to other descriptive statistics
to be able to make a judgment call.
For example, if the industry average debt-to-equity ratio is 1.0 or the average company
has $1 in debt to $1 in equity (a 50/50 split), the comparison of the company’s debt-to-equity
ratio to the industry or average would tell us that the company is quite a bit overleveraged if
it has $2 in debt to every $1 in equity (a 66/33 split).
Benchmarks for financial statements can include direct competitors, industry averages,
or a company’s own past performance. If a competitor has an inventory turnover ratio of
40 days, our company’s inventory turnover of 60 days means we’re less efficient at getting our
product out the door. But if last period we had an inventory turnover of 65 days, our current
period’s 60 days could reveal improvement in inventory management since the last period.
Using these diagnostic analytics gives a relative sense of place for firm performance.
Refer to Exhibit 8-3 to see Microsoft’s three-year trend and also how the company com-
pares with Apple and Facebook in 2020 when converting revenue into profit.
Auditors will use ratio analysis to pinpoint potential audit issues by considering how a
company’s financial statements depart from industry performance, a close competitor, or
even the same company’s prior-year performance. Competitors might use ratio analysis to
understand the vulnerabilities of a competitor. Bond investors might use ratio analysis to
see if a bond covenant is violated (e.g., some bond contracts require a borrower to maintain
a current ratio above 1.0 to help ensure the loan can be paid off).

Predictive Financial Analytics


Predicting future performance of a company is the work of trading analysts, researchers,
the finance department, and managers responsible for budgeting. They will work to forecast
sales, earnings, and cash flows based on past ratio analysis, time series data, analyst forecasts,
competitor and industry performance, as well as macroeconomic projections.

EXHIBIT 8-3
Comparison of Ratios
among Microsoft
(MSFT), Apple
(AAPL), and Facebook
(FB)

Microsoft Excel

ISTUDY
Chapter 8 Financial Statement Analytics 411

EXHIBIT 8-4
Horizontal Analysis of a
Common Size Financial
Statement

Microsoft Excel

A horizontal analysis is an analysis that shows the change of a value from one period to
the next. This is sometimes called a trend analysis. When you have two or more periods,
you calculate the proportional change in value from one to the next similar to a ratio analy-
sis. In Exhibit 8-4, we take Revenue in 2020 and divide it by Revenue in 2019 to show a
113.65 percent change or a 13.65 percent increase from one year to the next for Microsoft.
Other horizontal analyses are provided for Apple as well. The key question seems to be if
the trends from the past are expected to persist into the future that would allow the analyst
to forecast future performance.
Horizontal analysis can be used to calculate trends from one period to the next or
over time.

Change amount = Current year amount – Base year amount


Change percent = (Current year amount – Base year amount)/Base year amount

When you calculate the trend over a large period of time relative to a single base year,
you create an index. An index is a metric that shows how much any given subsequent year
has changed relative to the base year. The formula is the same as above, but we lock the
base year value when creating our formula, shown in Exhibit 8-5.
Using these trends and indices, we can better understand how a company performs over
time, calculate the average amount of change, and predict what the value is likely to be in
the next period.

ISTUDY
412 Chapter 8 Financial Statement Analytics

EXHIBIT 8-5
Index Showing Change
in Value Relative to
Base Year

Microsoft Excel

Prescriptive Financial Analytics


Prescriptive analytics recommends what should be done based on what we expect will hap-
pen. In other words, how do we optimize our performance based on potential constraints.
In financial statement analysis, analysts may use forecasts of sales, earnings, and cash flows
to estimate the intrinsic value of a stock to help determine if the stock should be bought,
sold, or held.
A second type of prescriptive analytics in financial statement analysis would be the per-
formance of sensitivity analysis. For example, will the buy/sell/hold/ignore decision change
based on changing assumptions of future sales, earnings growth, or discounted cash flows?
How sensitive is our decision to those assumptions and forecasts?

Data Analytics at Work

Determining Stock Value Using Financial Statement Analysis


In the short term the market is a popularity contest; in the long term it is
a weighing machine.
—Quote attributable to both Warren Buffett and Benjamin Graham
Aswath Damodaran is a well-known finance professor at NYU who special-
izes in valuing equities. He is often called upon to estimate the value of com-
pany stock based on underlying sales and earnings fundamentals. As shown
in the following exhibit, he suggested the valuation of Airbnb would be too
richly priced at $44 billion. But due to the frothy market valuations or poten-
tially other explanations, the company was valued at more than $100 billion
by the end of the first day of its initial public offering. Sometimes, despite our
best efforts at valuing the company based on its fundamentals, the market is
a popularity contest in the short term (as noted by Warren Buffett and Ben-
jamin Graham). But eventually, in the longer term the stock price will have to
be justified by the company’s fundamentals such as sales, earnings, and cash
flows both in the present and into the future.
One component of financial statement analysis is to value a company’s
fundamentals such as sales, earnings, and cash flows, to be used to decide
if an investor should buy, sell, hold, or ignore a stock. The more data that can
be analyzed, the better the analyst will be able to make an informed decision.

ISTUDY
Chapter 8 Financial Statement Analytics 413

Aswath Damodaran/Seeking Alpha

Source: https://seekingalpha.com/article/4392734-sharing-economy-come-home-ipo-
of-airbnb and https://www.reuters.com/article/airbnb-ipo/airbnb-valuation-surges-past-
100-billion-in-biggest-u-s-ipo-of-2020-idUSKBN28K261 (accessed April 7, 2021).

PROGRESS CHECK
1. Which ratios would a financial institution be most interested in when determining
whether to grant a loan to a business?
2. What would a horizontal trend tell you about a firm’s performance?

VISUALIZING FINANCIAL DATA LO 8-2


Visualizations help to highlight key figures present in the financial data. Whether to describe Explain how to
the data or show the relative value in diagnosing points of interest, color and graphs show create and read
many different dimensions. visualizations of
financial statement
Showing Trends data.
Sparklines and trendlines are used to help financial statement users easily visualize the data
and give meaning to the underlying financial data. A sparkline is a small visual trendline
or bar chart that efficiently summarizes numbers or statistics in a single spreadsheet cell.
Because it generally can fit in a single cell within a spreadsheet, it can easily add to the data
without detracting from the tabular results.
For what types of reports or spreadsheets should sparklines be used? It usually depends
on the type of reporting that is selected. For example, if used in a digital dashboard that
already has many charts and dials, additional sparklines might clutter up the overall appear-
ance. However, if used to show trends where it replaces or complements lots of numbers,
it might be used as a very effective visualization. The nice thing about sparklines is they are
generally small and just show simple trends rather than all the details regarding the horizon-
tal and vertical axes that you would expect on a normal graph.
Exhibit 8-6 provides an example of the use of sparklines in a horizontal trend analysis for
Microsoft. It shows the relative value of each line item and the overall trend.

ISTUDY
414 Chapter 8 Financial Statement Analytics

EXHIBIT 8-6
Visualizing Financial
Data with Heat Maps
and Sparklines

Microsoft Excel

Relative Size of Accounts Using Heat Maps


Another way to visualize financial data is to use heat maps (conditional formatting in Excel)
and charts. A heat map shows the relative size of values by applying a color scale to the data.
In Exhibit 8-6, the vertical composition of the accounts changes over the five-year period.
Color helps highlight dramatic shifts in each year, such as the drop in income in 2018.

Visualizing Hierarchy
A balance sheet, on the other hand, has an inherent hierarchy of accounts that is a good
candidate for a sunburst diagram. As shown in Exhibit 8-7, the center of the ring shows the
main sections of the balance sheet and their proportional size. As you move out, you see the
subgroups and individual accounts that make up the balance sheet.
EXHIBIT 8-7
Accoun

Sunburst Diagram
Cur

C h,
Oth t...

Showing Composition
An al...
Eq ash
s
Ca
ren

of a Balance Sheet
er

...

uiv
Sh d
Te rt
o
Inv rm
t...
Liab
Curr ies

No es
ilit
ent

Lia ncu
bil r...
itie
s L Non
iab cur
ilit ...
Li

ies
ab

t
ren
ili

Cur ets
tie

Ass
s

Retaine...
Assets Other
Current
Assets
k
mon Stoc
Com ck ty
Sto ui
Eq

... A
tio In Re cco
No sets

i ce un
r

d - l
he

As

Ad aid ita iva ts


ncu
Ot

P ap ...
r...

C
Go
od
to P ut...

Prop
attr ity

wi
Net t...
aren

Other.
Equ

ll
ib

er
of
..

ISTUDY
Chapter 8 Financial Statement Analytics 415

For some additional examples of visualizations that show financial data, including tree
diagrams, geographic maps, chord diagrams, and heat maps for word frequency in man-
agement discussion and analysis, explore the following website: rankandfiled.com, then
Explore Filers and click through to drill down to available visualizations.

PROGRESS CHECK
3. How might sparklines be used to enhance the DuPont analysis? Would you show
the sparklines for each component of the DuPont ROE disaggregation, or would
you propose it be shown only for the total?

TEXT MINING AND SENTIMENT ANALYSIS


Some data analysis is used to determine the sentiment included in text. For example, Uber LO 8-3
might use text mining and sentiment analysis to read all of the words used in social media
associated with its driving or the quality of its smartphone app and its services. The com- Describe the value
pany can analyze the words for sentiment to see how the social media participants feel of text mining
about its services and new innovations, as well as perform similar analysis on its competi- and sentiment
tors (such as Lyft or traditional cab services). analysis of financial
Similar analysis might be done to learn more about financial reports, SEC submissions, reporting.
analyst reports, and other related documents based on the words that are used. They might
provide a gauge of the overall tone of the financial reports. This tone might help us under-
stand management expectations of past or future performance that might complement the
numbers and figures in the reports.
To provide an illustration of the use and predictive ability of text mining and sentiment
analysis, Loughran and McDonald2 use text mining and sentiment analysis to predict the
stock market reaction to the issuance of a 10-K form by examining the proportion of negative
words used in a 10-K report. Exhibit 8-8 comes from their research suggesting that the stock
market reaction is related to the proportion of negative words (or inversely, the proportion
of positive words). They call this method overlap. Thus, using this method to define the tone
of the article, they indeed find a direct association, or relationship, between the proportion of
negative words and the stock market reaction to the disclosure of 10-K reports.
They measure proportion first by developing a dictionary of 2,337 negative words in the
financial context and then counting how many of those words are used as compared to the
total words used (called Fin-Neg in Exhibit 8-8). One of their arguments is that a financial
dictionary is better than a dictionary created from standard English usage. For that reason,
they differentiate their financial dictionary (Fin-Neg) from the negative words used in nor-
mal English usage (as shown in Exhibit 8-8 as H4N-Inf). Whereas cost, expense, or liability
might be viewed as negative in normal English, they are not considered to be negative words
in the financial dictionary. The most frequent negative words in the financial dictionary
include words like loss, claims, impairment, adverse, restructuring, and litigation.
Exhibit 8-9 provides a graph featuring the frequency of total words (word count), positive
words (positive WC), and negative words (negative WC) for Microsoft’s 10-K from 2014 to
2020. The exhibit also displays the PN ratio, a ratio of positive-to-negative words over time.
Exhibit 8-10 displays another type of visualization for the relative frequency of positive and
negative words for Microsoft from 2014 to 2020. The PN ratio shows the ratio of positive
words. On average, for every 10 negative words, Microsoft states almost 5 positive words.

2
Tim Loughran and Bill McDonald, “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and
10-Ks,” Journal of Finance 66, no. 1 (2011), pp. 35–65.

ISTUDY
416 Chapter 8 Financial Statement Analytics

EXHIBIT 8-8
Stock Market Reaction
(Excess Return) of
Companies Sorted by
Proportion of Negative
Words
The lines represent the
words from a financial
dictionary (Fin-Neg)
and a standard English
dictionary (H4N-inf).

Source: Tim Loughran and


Bill McDonald, “When Is a
Liability Not a Liability? Tex-
tual Analysis, Dictionaries,
and 10-Ks,” Journal of Finance
66, no. 1 (2011), pp. 35–65.

EXHIBIT 8-9
Frequency of Words
Included in Microsoft’s
Annual Report

Microsoft Excel

Lab Connection
Lab 8-4 has you analyze the sentiment of Forms 10-K for various companies
and create a visualization. Specifically, you will be looking for the differences
between the positive and negative words over time.

ISTUDY
Chapter 8 Financial Statement Analytics 417

EXHIBIT 8-10
Visual Representation
of Positive and Negative
Words for Microsoft by
Year with a Line Chart
Showing the Change
in Positive-to-Negative
Ratio for the Same
Period
The bar chart represents
the number of positive
and negative words
and the line represents
the positive-to-negative
Microsoft Excel
ratio from a financial
dictionary (Fin-Neg).

PROGRESS CHECK
4. Which do you predict would have more positive sentiment in a 10-K, the foot-
notes to the financial statements or the MD&A (management discussion and
analysis) of the financial statements?
5. Why would you guess the results between the proportion of negative words and
the stock market reaction to the 10-K issuance differ between the Fin-Neg and
the H4N-Inf dictionary?

XBRL AND FINANCIAL DATA QUALITY


XBRL is a global standard for tagging and reporting financial information in a computer-
LO 8-4
readable format. XBRL stands for eXtensible Business Reporting Language and is a type of
XML (extensible markup language) used for organizing and defining financial elements. In the Describe how
United States and other jurisdictions, companies are required to tag each piece of financial data XBRL tags financial
that appears in their financial statements so that it is machine readable. Once these instance reporting data and
documents are submitted and validated by the regulatory body, they are immediately available how it facilitates
for public consumption by different types of financial statement users, including financial ana- financial statement
lysts, investors, or lenders. These users can then leverage data models to quickly analyze large analysis.
amounts of data from the entire population of listed companies with minimal effort.
As of June 2011, the Securities and Exchange Commission requires all public company
filers, including smaller reporting companies and foreign private issuers, to file an XBRL
instance document, which contains the same information found in the traditional financial
statements but in computer-readable format. In addition to tagging financial values, such as
account balances and lease amounts, companies must tag every date, fact, figure, percent-
age, and paragraph of text in management discussion and analysis and footnotes.
The preparer of an XBRL instance document must begin by identifying a correct tax-
onomy that defines and describes each key standardized data element (like cash or accounts
payable), shown in Exhibit 8-11. The XBRL taxonomy also defines the relationships between
each element—for example, buildings and improvements are a component of property, plant,
and equipment, which is a component of noncurrent assets, which is a component of assets,
which is in the statement of financial position (balance sheet), shown in Exhibit 8-12.
The current U.S. GAAP Financial Reporting Taxonomy can be explored interactively at
xbrlview.fasb.org. It defines more than 19,000 elements with descriptions and links to the FASB
codification. For example, the XBRL tag for cash is labeled “Cash” and is defined as follows:

ISTUDY
418 Chapter 8 Financial Statement Analytics

EXHIBIT 8-11
Creating an XBRL SEC Central
Index Key
Instance Document

1. Taxonomy
(e.g., us-gaap-2020) Reporting
Periods
4. XBRL
Instance
3. Financial
Document Extension
Statements (Interactive
Financial Calculations
Statements)

2. Extension
Schema Extension
(e.g., abc-company) Presentation
5. Document
Validation
Extension
Labels
EDGAR Filing
(SEC)

EXHIBIT 8-12 Cash, Cash Cash and Cash


Organization of Equivalents, Equivalents, at
Accounts within the and Carrying Value
XBRL Taxonomy Short-Term
Short-Term
Assets, Investments
Investments
Current
Accounts
Receivable
Receivables,
Net, Current Allowance for
Statement Doubtful
of Income Accounts

Statement Assets Land and Land


of Financial Improvements
Position Liabilities
Statement and Equity Buildings and
Statement of Property, Improvements
U.S. GAAP Cash Flows Plant and
Disclosure Equipment Machinery and
Statement of Equipment
Comprehensive Assets,
Intangible
Income Noncurrent Accumulated
Assets
Depreciation
Goodwill

Amount of currency on hand as well as demand deposits with banks or financial


institutions. Includes other kinds of accounts that have the general characteristics
of demand deposits. Excludes cash and cash equivalents within disposal group and
discontinued operation.3

The XBRL tag for cash and cash equivalents footnote disclosure is labeled as “CashAnd-
CashEquivalentsDisclosureTextBlock” and is defined as follows:
3
https://xbrl.us/xbrl-taxonomy/2017-us-gaap/.

ISTUDY
Chapter 8 Financial Statement Analytics 419

The entire disclosure for cash and cash equivalent footnotes, which may include the
types of deposits and money market instruments, applicable carrying amounts, restricted
amounts and compensating balance arrangements. Cash and equivalents include: (1)
currency on hand; (2) demand deposits with banks or financial institutions; (3) other
kinds of accounts that have the general characteristics of demand deposits; (4) short-
term, highly liquid investments that are both readily convertible to known amounts of
cash and so near their maturity that they present insignificant risk of changes in value
because of changes in interest rates. Generally, only investments maturing within
three months from the date of acquisition qualify.4

The use of tags allows data to be quickly transmitted and received, and the tags serve as
an input for financial analysts valuing a company, an auditor finding areas where an error
might occur, or regulators seeing if firms are in compliance with various regulations and
laws (like the SEC or IRS).
Preparers of the XBRL instance document compare the financial statement figures with
the tags in the taxonomy. When a tag does not exist, the preparer can extend the taxonomy
with their own custom tags. The taxonomy and extension schema are combined with the
financial data to generate the XBRL instance document, which is then validated for errors
and submitted to the regulatory authority.

XBRL Data Quality


While XBRL enables Data Analytics models to quickly process and present interesting patterns
in the data, the user must be careful not to trust all of the numbers at face value. Users may notice
that data values may be missing or incorrect. The XBRL-US Center for Data Quality laments that:

Investors and analysts have been reluctant to use the data because of concerns about
its accuracy, consistency, and reliability. Inconsistent or incorrect data tagging,
including the use of custom tags in lieu of standard tags and input mistakes, causes
translation errors, which make automated analysis of the data unduly difficult.5

Part of the problem is that most companies outsource the preparation of XBRL financial
statements to other companies, such as Fujitsu and R.R. Donnelley, and they don’t validate
the data themselves. Another problem is that ambiguity in the taxonomy leads SEC filing
companies to select incorrect tags or use extension tags where a standard tag exists. Because
these statements are not audited, there is little incentive to improve data quality unless
stricter validation measures are put in place.
Despite these limitations, there is still value in analyzing XBRL data, and some provid-
ers have additional solutions to make XBRL data comparable. Outside data vendors cre-
ate standardized metrics to make the company reported XBRL data more comparable. For
example, Calcbench, a data vendor that eases financial analysis for XBRL users, makes
standardized metrics, noting:

4
https://xbrl.us/xbrl-taxonomy/2017-us-gaap/.
5
https://xbrl.us/data-quality/.

ISTUDY
420 Chapter 8 Financial Statement Analytics

IBM labels revenue as “Total revenue” and uses the tag “Revenues”, whereas Apple,
labels their revenue as “Net sales” and uses the tag “SalesRevenueNet”. This is a
relatively simple case, because both companies used tags from the FASB taxonomy.
Users are typically not interested in the subtle differences of how companies tag or
label information. In the previous example, most users would want Apple and IBM’s
revenue, regardless of how it was tagged. To that end, we create standardized metrics.6

Different data vendors such as XBRLAnalyst and Calcbench both provide a trace
function that allows you to trace the standardized metric back to the original source to see
which XBRL tags are referenced or used to make up the standardized metric.7
Exhibit 8-13 shows what a report using standardized metrics looks like for Boeing’s
balance sheet. Note the standardized XBRL tags used for Boeing could also be used to
access the financial statements for other SEC registrants.

XBRL, XBRL-GL, and Real-Time Financial Reporting


The desire for machine-readable data doesn’t stop at the financial statements. Many
financial reporting systems within enterprise systems such as Oracle and SAP have a
general ledger that is consistent with XBRL, called XBRL-GL (XBRL–Global Ledger).
That means once the numbers are input into a financial system, they are already tagged
and able to be transmitted in real time to interested users in a continuous reporting
function.
Of course, there are a number of reasons this information is not transmitted in real time.
For example, the accounting information has not yet been audited, and it may contain
errors. Other information such as goodwill or long-term debt will likely not change on a
minute-by-minute basis, so there would be no use for it on a real-time basis. But as systems
advance and continuous, real-time auditing becomes more prevalent, and with our under-
standing of how and exactly what type of real-time information might be used, there may
be a chance of providing real-time accounting information in the relative short term by use
of XBRL-GL.

6
https://knowledge.calcbench.com/hc/en-us/articles/230017408-what-is-a-standardized-metric
(accessed August 2017).
7
https://knowledge.calcbench.com/hc/en-us/articles/230017408-What-is-a-standardized-metric.

ISTUDY
Chapter 8 Financial Statement Analytics 421

EXHIBIT 8-13
Balance Sheet from
XBRL Data
Note the XBRL tag names
in the far left column.

Source: https://www.
calcbench.com/xbrl_to_excel

ISTUDY
422 Chapter 8 Financial Statement Analytics

Examples of Financial Statement Analytics Using XBRL


We illustrate the DuPont ratios in Exhibit 8-14 by considering a calculation from some
standard XBRL data.

EXHIBIT 8-14
DuPont Ratios Using
XBRL Data
Source: https://www.
calcbench.com/xbrl_to_excel

You’ll note for the Quarter 2 analysis in 2009, for DuPont (Ticker Symbol = DD), if you
take its profit margin, 0.294, multiplied by operating leverage of 20.1 percent multiplied by
the financial leverage of 471.7 percent, you get a return on equity of 27.8 percent.

PROGRESS CHECK
6. How does XBRL facilitate Data Analytics by analysts?
7. How might standardized XBRL metrics be useful in comparing the financial state-
ments of General Motors, Alphabet, and Alibaba?
8. Assuming XBRL-GL is able to disseminate real-time financial reports, which real-
time financial elements (account names) might be most useful to decision makers?
And which information might not be useful?
9. Using Exhibit 8-14 as the source of data and using the raw accounts, show the
components of profit margin, operating leverage, and financial leverage and
how they are combined to equal ROE for Q1 2014 for DuPont (Ticker = DD).

Summary
Data Analytics extends to the financial accounting and financial reporting space.
■ Financial statement analytics include descriptive analytics, such as financial ratios and
vertical analysis; diagnostic analytics, where we compare those to benchmarks from
prior periods or competitors; predictive analytics, including horizontal trend analysis;
and prescriptive analytics. (LO 8-1)
■ Sparklines and trendlines are efficient and effective tools to visualize firm performance,
and sunburst diagrams and heat maps help highlight values of interest. (LO 8-2)
■ Sentiment analysis could be used with financial statements, other financial reports, and
other financially related information to gauge positive and negative meaning from other-
wise text-heavy notes. (LO 8-3)
■ The XBRL taxonomy provides tags for more than 19,000 financial elements and allows
for the use of company-defined tags when the normal XBRL tags are not suitable. (LO 8-4)

ISTUDY
■ By tagging financial elements in a computer-readable manner, XBRL facilitates the accu-
rate and timely transmission of financial reporting to all interested stakeholders. (LO 8-4)
■ XBRL and Data Analytics allow timely analysis of the financial statements and the com-
putation of financial ratios. We illustrated their use by showing the DuPont ratio frame-
work. (LO 8-4)

Key Words
common size financial statement (407) A type of financial statement that contains only basic
accounts that are common across companies.
DuPont ratio (409) Developed by the DuPont Corporation to decompose performance (particularly
return on equity [ROE]) into its component parts.
financial statement analysis (406) Used by investors, analysts, auditors, and other interested
stakeholders to review and evaluate a company’s financial statements and financial performance.
heat map (414) A visualization that shows the relative size of values by applying a color scale to the data.
horizontal analysis (411) An analysis that shows the change of a value from one period to the next.
index (411) A metric that shows how much any given subsequent year has changed relative to the base year.
ratio analysis (407) A tool used to evaluate relationships among different financial statement items to
help understand a company’s financial and operating performance.
sparkline (413) A small visual trendline or bar chart that efficiently summarizes numbers or statistics
in a single spreadsheet cell.
standardized metrics (419) Metrics used by data vendors to allow easier comparison of company-
reported XBRL data.
sunburst diagram (414) A visualization that shows inherent hierarchy.
vertical analysis (407) An analysis that shows the proportional value of accounts to a primary
account, such as Revenue.
XBRL (eXtensible Business Reporting Language) (417) A global standard for exchanging
financial reporting information that uses XML.
XBRL-GL (420) Stands for XBRL–General Ledger; relates to the ability of an enterprise system to tag
financial elements within the firm’s financial reporting system.
XBRL taxonomy (417) Defines and describes each key data element (like cash or accounts payable).
The taxonomy also defines the relationships between each element (like inventory is a component of current
assets and current assets is a component of total assets).

ANSWERS TO PROGRESS CHECKS


1. Liquidity ratios (e.g., current ratio or quick ratio) would tell the bank whether the business
could meet short-term obligations using current assets. Solvency ratios (e.g., debt-to-equity
ratio) would indicate how leveraged the company was and the likelihood of paying long-
term debt. It may also determine the interest rate that the banks charge.
2. The horizontal analysis shows the trend over time. We could see if revenues are going up
and costs are going down as the result of good management or the opposite in the case
of inefficiencies or decline.
3. Answers may vary on how to visualize DuPont’s ROE disaggregation. It might depend on
the type of reporting that is selected. For example, is it solely a digital dashboard, or is it
a report with many facts and figures where more sparklines might clutter up the overall
appearance? The nice thing about sparklines is they are generally small and just show
simple trends rather than details about the horizontal and vertical axes.
423

ISTUDY
4. The MD&A section of the 10-K has management reporting on what happened in the most
recent period and what they expect will happen in the coming year. They are usually
upbeat and generally optimistic about the future. The footnotes are generally backward
looking in time and would be much more fact-based, careful, and conservative. We would
expect the MD&A section to be much more optimistic than the footnotes.
5. Accounting has its own lingo. Words that might seem negative for the English language
are not necessarily negative for financial reports. For this reason, the results differ based
on whether the standard English usage dictionary (H4N-Inf) or the financial dictionary
(Fin-Neg) is used. The relationship between the excess stock market return and the finan-
cial dictionary is what we would expect.
6. By each company providing tags for each piece of its financial data as computer readable,
XBRL allows immediate access to all types of financial statement users, be they financial
analysts, investors, or lenders, for their own specific use.
7. When journal entries and transactions are made in an XBRL-GL system, there is the possi-
bility of real-time financial reporting. In our opinion, income statement information (includ-
ing sales, cost of goods sold, and SG&A expenditures) would be useful to financial users
on a real-time basis. Any information that does not change frequently would not be as
useful. Examples include real-time financial elements, including goodwill; long-term debt;
and property, plant, and equipment.
8. Standardized metrics are useful for comparing companies because they allow for similar
accounts to have the same title regardless of the account names used by the various com-
panies. They allow for ease of comparison across multiple companies.
9. Profit margin = (Revenues – Cost of revenue)/Revenues = ($10.145B – $6.000B)/
$10.145B = 40.9%
Operating leverage (or Asset turnover) = Sales/Assets = ($10.145B/$47.800B) = 21.2%
Equity multiplier (or Financial leverage) = Assets/Equity = $47.800B/$16.442B = 290.7%
ROE = Profit margin × Operating leverage (or Asset turnover) × Financial leverage =
0.409 × 0.212 × 2.907 = 0.252

Multiple Choice Questions ®

1. (LO 8-1) The DuPont analysis of return on equity (ROE) includes all of the following com-
ponent ratios except:
a. asset turnover.
b. inventory turnover.
c. financial leverage.
d. profit margin.
2. (LO 8-1) What type of ratios measure a firm’s operating efficiency?
a. DuPont ratios
b. Liquidity ratios
c. Activity ratios
d. Solvency ratios
3. (LO 8-1) Performance comparisons to a company’s own past or to its competition would
be considered _____ analytics.
a. prescriptive
b. descriptive
c. predictive
d. diagnostic

424

ISTUDY
4. (LO 8-2) In which stage of the IMPACT model (introduced in Chapter 1) would the use of
sparklines fit?
a. Track outcomes
b. Communicate insights
c. Address and refine results
d. Perform test plan
5. (LO 8-3) What computerized technique would be used to perform sentiment analysis on
an annual accounting report?
a. Text mining
b. Sentiment mining
c. Textual analysis
d. Decision trees
6. (LO 8-2) Determining how sensitive a stock’s intrinsic value to assumptions and
estimates made would be an example of _____ analytics.
a. diagnostic
b. predictive
c. descriptive
d. prescriptive
7. (LO 8-4) XBRL stands for:
a. Extensible Business Reporting Language.
b. Extensive Business Reporting Language.
c. XML Business Reporting Language.
d. Excel Business Reporting Language.
8. (LO 8-4) Which term defines and describes each XBRL financial element?
a. Data dictionary
b. Descriptive statistics
c. XBRL-GL
d. Taxonomy
9. (LO 8-4) What is the term used to describe the process of assigning XBRL tags inter-
nally within a financial reporting/enterprise system?
a. XBRL tagging
b. XBRL taxonomy
c. XBRL-GL
d. XBRL dictionary
10. (LO 8-4) What is the name of the output from data vendors to help compare companies
using different XBRL tags for revenue?
a. XBRL taxonomy
b. Data assimilation
c. Consonant tagging
d. Standardized metrics

Discussion and Analysis


1. (LO 8-3) Which would you predict would have more positive sentiment in a 10-K, the
financial statements or the MD&A (management discussion and analysis) of the financial
statements? More positive sentiment in the footnotes or MD&A? Why?
2. (LO 8-2) Would you recommend the Securities and Exchange Commission require the
use of sparklines on the face of the financial statements? Why or why not?
425

ISTUDY
3. (LO 8-1) Why do audit firms perform analytical procedures to identify risk? Which type
of ratios (liquidity, solvency, activity, and profitability ratios) would you use to evaluate
the company’s ability to continue as a going concern?
4. (LO 8-4) Go to https://xbrl.us/data-rule/dqc_0015-lepr/ and find the XBRL element
name for Interest Expense and Sales, General, and Administrative Expense.
5. (LO 8-4) Go to https://xbrl.us/data-rule/dqc_0015-lepr/ and find the XBRL element name
for Other Nonoperating Income and indicate whether XBRL says that should normally
be a debit or credit entry.
6. (LO 8-1) Go to finance.yahoo.com and type in the ticker symbol for Apple (AAPL) and click
on the statistics tab. Which of those variables would be useful in assessing profitability?
7. (LO 8-4) Can you think of any other settings, besides financial reports, where tagged
data might be useful for fast, accurate analysis generally completed by computers?
How could it be used in a hospital setting? Or at your university?
8. (LO 8-3) Can you think of how sentiment analysis might be used in a marketing setting?
How could it be used in a hospital setting? Or at your university? When would it be
especially good to measure the sentiment?

Problems
1. (LO 8-1) Match the description of the financial statement analysis question to the data
analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Financial Statement Analysis Question Data Analytics Type

1. What is the firm’s debt relative to its assets in the past year?
2. What is the intrinsic value of the stock?
3. H
 ow much would the company have earned if there had not been a
business interruption (due to fire or natural disaster)?
4. W
 hat is the company’s cash flows from operating activities over the
past three years?
5. W
 hy did the collectability of receivables (amount our customers owe
to the company) fall in the current quarter as compared to the prior
quarters?
6. Should we buy, sell, or hold the stock?

2. (LO 8-1) Match the description of the financial statement analysis technique to the data
analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Financial Statement Analysis Technique Data Analytics Type

1. Use of time series analysis to project future operating cash flows


2. Past performance comparisons to competitor
3. Use of analyst forecasts to project future sales
4. Drill-down analytics to determine relations/patterns/linkages between
income statement line items

426

ISTUDY
Chapter 8 Financial Statement Analytics 427

Financial Statement Analysis Technique Data Analytics Type

5. Vertical and horizontal analysis


6. Summary statistics of past performance
7. Net present value analysis using forecasts of earnings and cost of
capital

3. (LO 8-1) Match the following descriptions or equations to one of the following compo-
nents of the DuPont ratio:
• Return on stockholders’ equity
• Profit margin
• Asset turnover
• Equity multiplier
• Return on assets

Description or Equation DuPont Ratio Component

1. The profit returned on each dollar of product sales


2. The profit returned on each dollar of assets invested
3. Net income/Average stockholders’ equity
4. The profit returned on each dollar of equity
5. Net sales/Average total assets
6. An example of an activity ratio
7. A type of solvency ratio

4. (LO 8-2) Match the following descriptions to one of the following visualization types:
• Heat map
• Sparklines
• Sunburst diagram

Illustrates Description Visualization Type


1. Illustrates trends in sales
2. Illustrates inherent hierarchy
3. Illustrates relative size
4. Illustrates increasing profitability
5. Illustrates the component parts of an income statement

5. (LO 8-3) Analysis: Can you think of situations where sentiment analysis might be help-
ful to analyze press releases or earnings announcements? What additional informa-
tion might it provide that is not directly in the overall announcement? Would it be
useful to have sentiment analysis automated to just get a basic sentiment measure
versus the base level of sentiment expected in a press announcement or earnings
announcement?
6. (LO 8-3) Analysis: We noted in the text that negative words in the financial diction-
ary include words such as loss, claims, impairment, adverse, restructuring, and litiga-
tion. What other negative words might you add to that list? What are your thoughts on
positive words that would be included in the financial dictionary, particularly those that
might be different than standard English dictionary usage?
7. (LO 8-1) You’re asked to figure out how the stock market responded to Amazon’s
announcement on June 16, 2017, that it would purchase Whole Foods—arguably a
transformational change for Amazon, Walmart, and the whole retail industry.

ISTUDY
7A. Go to finance.yahoo.com, type in the ticker symbol for Amazon (AMZN), click on
historical data, and find the closing stock price on June 15, 2017. How much did
the stock price change to the closing stock price on June 16, 2017? What was the
percentage of increase or decrease?
7B. Do the same analysis for Walmart (WMT) over the same dates. How much did the
stock price change on June 16, 2017? What was the percentage of increase or
decrease? Which company was impacted the most and had the largest percentage
of change?

Company Stock Price 6/15/17 Stock Price 6/16/17 Change $ Change %


A) Amazon
B) Walmart

8. (LO 8-1) The preceding question asked you to figure out how the stock market
responded to Amazon’s announcement that it would purchase Whole Foods. The
question now is if the stock market for Amazon had higher trade volume on that day
than the average of the month before.
8A. Go to finance.yahoo.com, type in the ticker symbol for Amazon (AMZN), click on
historical data, and input the dates from May 15, 2017, to June 16, 2017. Down-
load the data, calculate the average volume for the month prior to June 16, and
compare it to the trading volume on June 16. What impact did the Whole Foods
announcement have on Amazon trading volume?
8B. Do the same analysis for Walmart (WMT) over the same dates. What impact did the
Whole Foods announcement have on Walmart trading volume?

Average Volume
Company 5/15/17 to 6/15/17 Volume on 6/16/17 Volume Change %
A) Amazon
B) Walmart

9. (LO 8-3) Go to Loughran and McDonald’s sentiment word lists at https://sraf.nd.edu/


textual-analysis/resources/ and download the Master Dictionary. These lists are what
they’ve used to assess sentiment in financial statements and related financial reports.
Select the appropriate category (Negative, Constraining, Both, or None) for each given
word below, according to the authors’ word list.

Word Category
1. ADVERSE
2. AGAINST
3. CLAIMS
4. IMPAIRMENT
5. LIMIT
6. LITIGATION
7. LOSS
8. PERMITTED
9. PREVENT
10. REQUIRED
11. BOUND
12. UNAVAILABLE
13. CONFINE

428

ISTUDY
10. (LO 8-3) Go to Loughran and McDonald’s sentiment word lists at https://sraf.nd.edu/
textual-analysis/resources/ and download the Master Dictionary. These lists are what
they’ve used to assess sentiment in financial statements and related financial reports.
Select the appropriate category (Litigious, Positive, Both, or None) for each given word
below, according to the authors’ word list.

Word Category
1. BENEFIT
2. BREACH
3. CLAIM
4. CONTRACT
5. EFFECT
6. ENABLE
7. SETTLEMENT
8. SHALL
9. STRONG
10. SUCCESSFUL
11. ANTICORRUPTION
12. DEFENDABLE
13. BOOSTED

429

ISTUDY
LABS ®

Lab 8-1 C
 reate a Horizontal and Vertical Analysis Using
XBRL Data—S&P100
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: This lab will pull in XBRL data from S&P100 companies listed with the
Securities and Exchange Commission. You have the option to analyze a pair of companies
of your choice based on your own interest level. This lab will have you compare other com-
panies as well.
Data: Lab 8-1 SP100 Facts and Script.zip - 601KB Zip / 607KB Excel / 18KB Text

Lab 8-1 Example Output


By the end of this lab, you will create a dynamic financial statement where values are pulled
from a set of financial facts based on a company ticker. This will be used to create some
simple horizontal and vertical analyses. While your results will include different data values,
your work should look similar to this:

Microsoft | Excel & Google | Sheets

Microsoft Excel

LAB 8-1M Example of Dynamic Analysis in Microsoft Excel and Google Sheets

430

ISTUDY
Lab 8-1 Part 1 Connect to Your Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 8-1 [Your name] [Your email address].docx.
Financial statement analysis frequently involves identifying relationships between spe-
cific pieces of data. We may want to see how financial data have changed over time or how
the composition has changed.
To create a dynamic spreadsheet, you must first create a PivotTable (Excel) or connect
your sheet to a data source on the Internet (Google Sheets). Because Google Sheets is
hosted online, you will add the iXBRLAnalyst script to connect it to FinDynamics so you
can use formulas to query financial statement elements.
Because companies frequently use different tags to represent similar concepts (such as
the tags ProfitLoss or NetIncomeLoss to identify Net Income), it is important to make sure
you’re using the correct values. FinDynamics attempts to coordinate the diversity of tags by
using normalized tags that use formulas and relationships instead of direct tags. Normalized
tags must be contained within brackets []. Some examples are given in Lab Table 8-1.

Balance Sheet Income Statement Statement of Cash Flows

[Cash, Cash Equivalents and Short-Term [Revenue] [Cash From Operations (CFO)]
Investments] [Cost of Revenue] [Changes in Working Capital]
[Short-Term Investments] [Gross Profit] [Changes in Accounts Receivables]
[Accounts Receivable, Current] [Selling, General & Administrative Expense] [Changes in Liabilities]
[Inventory] [Research & Development Expense] [Changes in Inventories]
[Other Current Assets] [Depreciation (&Amortization), IS] [Adjustments of Non-Cash Items, CF]
[Current Assets] [Non-Interest Expense] [Provision For Doubtful Accounts]
[Net of Property, Plant & Equipment] [Other Operating Expenses] [Depreciation (&Amortization), CF]
[Long-Term Investments] [Operating Expenses] [Stock-Based Compensation]
[Intangible Assets, Net] [Operating Income] [Pension and Other Retirement Benefits]
[Goodwill] [Other Operating Income] [Interest Paid]
[Other Noncurrent Assets] [Non-Operating Income (Expense)] [Other CFO]
[Noncurrent Assets] [Interest Expense] [Cash from Investing (CFI)]
[Assets] [Costs and Expenses] [Capital Expenditures]
[Accounts Payable and Accrued Liabilities, [Earnings Before Taxes] [Payments to Acquire Investments]
Current] [Income Taxes] [Proceeds from Investments]
[Short-Term Borrowing] [Income from Continuing Operations] [Other CFI]
[Long-Term Debt, Current] [Income from Discontinued Operations, Net [Cash From Financing (CFF)]
[Other Current Liabilities] of Taxes] [Payment of Dividends]
[Current Liabilities] [Extraordinary Items, Gain (Loss)] [Proceeds from Sale of Equity]
[Other Noncurrent Liabilities] [Net Income] [Repurchase of Equity]
[Noncurrent Liabilities] [Net Income Attributable to Parent] [Net Borrowing]
[Liabilities] [Net Income Attributable to Noncontrolling [Other CFF]
[Preferred Stock] Interest] [Effect of Exchange Rate Changes]
[Common Stock] [Preferred Stock Dividends and Other [Total Cash, Change]
[Additional Paid-in Capital] Adjustments] [Net Cash, Continuing Operations]
[Retained Earnings (Accumulated Deficit)] [Comprehensive Income (Loss)] [Net CFO, Continuing Operations]
[Equity Attributable to Parent] [Other Comprehensive Income (Loss)] [Net CFI, Continuing Operations]
[Equity Attributable to Noncontrolling [Comprehensive Income (Loss) Attributable [Net CFF, Continuing Operations]
Interest] to Parent] [Net Cash, DO]
[Stockholders’ Equity] [Comprehensive Income (Loss) Attributable [Net CFO, DO]
[Liabilities & Equity] to Noncontrolling Interest] [Net CFI, DO]
[Net CFF, DO]

LAB TABLE 8-1


Normalized Accounts Created by FinDynamics for XBRLAnalyst

431

ISTUDY
Microsoft | Excel

1. Open Lab 8-1 SP100 Facts.xlsx in Excel.


2. Transform the data into a PivotTable:
a. Select the table data in the XBRLFacts sheet (press Ctrl + A on Windows
or Cmd + A on Mac).
b. In the ribbon, go to Insert> PivotTable.
c. Choose New Worksheet and click OK.
d. Rename the new sheet PivotFacts.
e. In the PivotTable Fields pane, drag the fields to the following locations:
1. Columns: Ticker, Year
2. Rows: Accounts
3. Values: Value Note: Verify that this shows the Sum of Value. If not,
click the field drop-down arrow and choose Value Field Settings to
change it to Sum.
3. Take a screenshot of your pivot table (label it 8-1MA). Note: Values with
a 0 represent either missing values or elements that are not found in a com-
pany’s XBRL statement.
4. Add a new worksheet (rename it Part1) and test your connection to the Pivot-
Table by typing in the following formula anywhere on your sheet:
= GETPIVOTDATA(“Value”,PivotFacts!$A$3,”Ticker”,”AAPL”,”Year”,”20
20”,”Account”,”[Net Income]”)/1000000
If your PivotTable is correct, it should return the value 57411 for Apple Inc.’s
2020 net income in millions of dollars.
The basic formula for looking up data from a PivotTable in Excel is:
= GETPIVOTDATA(data_field, pivot_table, [field1, item1], [field2, item2],. . .)
where:
data_field = “Value” (e.g., 57411 or the value you want to return).
pivot_table = PivotFacts!$A$3 (i.e., any cell reference within the table you are
looking up values from).
[field1, item1] = “Ticker”,”AAPL” (i.e., this looks up the ticker “AAPL” in the
“Ticker” field to find the matching “Value” in the PivotTable).
[field2, item2] = “Year”,”2020” (i.e., this looks up the year “2020” in the “Year”
field to find the matching “Value” in the PivotTable).
[field3, item3] = “Account”,”[Net Income]” (i.e., this looks up the account
name “[Net Income]” in the “Account” field to find the matching “Value” in the
PivotTable).

Note: For each of the items listed in the formula (e.g., AAPL, 2020, or [Net
income]), you can use cell references containing those values (e.g., B$1, $B$2,
and $A7) as inputs for your formula instead of hardcoding those values. These
references are used to look up or match the values in the PivotTable.
5. Using the formulas given above, answer the lab questions and continue to
Part 2.

432

ISTUDY
Rev. Confirming Pages

Google | Sheets

1. Log in to Google Sheets (sheets.google.com) and create a new blank sheet


called Lab 8-1 SP100 Analysis.
2. Add the iXBRLAnalyst add-in to your new Google Sheet:
a. Click Extensions > Apps Script from the menu.
b. In a new browser window or tab, go to findynamics.com/gsheets/
ixbrlanalyst.gs (alternatively, open the ixbrlanalyst.txt file included in the
lab files).
c. Copy and paste the entire script from the FinDynamics page into the
Script Editor window, replacing any existing text.
d. Click Save and name the project XBRL.
e. Close the Script Editor window and return to your Google Sheet.
f. Reload/refresh the page. If you see a new iXBRLAnalyst menu appear,
you are now connected to the XBRL data. Note: While there are some
menu items for Connect and Clear Cache, you do not need to do any-
thing with them to access S&P100 companies’ data.
3. Test your connection by typing in the following formula anywhere on your
sheet:
= XBRLFact(“AAPL”,”NetIncomeLoss”,”2020”)/1000000
If your connection is good, it should return the value 57411 for Apple Inc.’s
2020 net income amount in millions of dollars.
4. Take a screenshot of your Google Sheet showing the calculated amount
(label it 8-1GA).
Note: Once you’ve added the iXBRLAnalyst script to a Google Sheet, you can simply
open that sheet, then go to File > Make a copy . . . , and the script will automatically
be copied to the new sheet.
The basic formulas available with the iXBRLAnalyst script are:
= XBRLFact(company, tag, year, period, member, scale)
= FinValue(company, tag, year, period, member, scale)
where:
company = ticker symbol (e.g., “AAPL” for Apple Inc.)
tag = XBRL tag or normalized tag (e.g., “NetIncomeLoss” or “[Net Income]”)
year = reporting year (e.g., “2020”)
period = fiscal period will default to year if not indicated (e.g., “Q1” for 1st Quarter
or “Y” for year)
member = a business segment (e.g., “InfrastructureMember”) though this attribute
is rarely used.
scale = rounding (e.g., “k,” “thousands,” or “1000” for thousands) Note: If there is
an error with rounding, simply divide the formula by the scale instead (e.g., = XBRL
Fact(c,t,y,p)/scale).
If you’re looking for specific XBRL tags, you can explore the current XBRL taxonomy
at xbrlview.fasb.org.
5. Using the formulas given above, answer the lab questions and continue to Part 2.

433

ISTUDY ric44907_ch08_404-453.indd 433 06/22/22 10:16 AM


Lab 8-1 Part 1 Objective Questions (LO 8-1)
OQ1. What was the Net Income value for AAPL in 2020? (Round to the nearest mil-
lion dollars.)
OQ2. Change the values in your test formula. What is the Net Income for NKE in
2020? (Round to the nearest million dollars.)
OQ3. Change the values in your test formula. What is the value of [Current Assets]
or AssetsCurrent for PG in 2020? (Round to the nearest million dollars.)

Lab 8-1 Part 1 Analysis Questions (LO 8-1)


AQ1. iXBRLAnalyst in Google Sheets uses “normalized” tags for common financial
statement elements. What is the purpose of “normalized” XBRL tags?
AQ2. Select either Apple Inc. (AAPL) or Nike (NKE), and identify three questions
you might want to know about that company’s financial performance over the
past three years. For example, “What is the trend of operating costs?”
AQ3. Form a hypothesis for each of your questions. For example, “I expect Nike’s
operating costs have gone up.”

Lab 8-1 Part 2 Look Up Data and Perform Analysis


We will begin by creating a common size income statement for one company over a three-
year period. You will use a combination of spreadsheet formulas and live XBRL data to
generate a spreadsheet that is adaptable and dynamic. In other words, you will create a
template that can be used to answer several financial statement analysis questions about
different companies over time.

Microsoft | Excel

1. In your Excel document, add a new sheet and rename it Part2. Then enter
the values, as shown:
A B
1 Company AAPL
2 Year 2020
3 Period Y
4 Scale 1000000

2. Next, set up your financial statement using the following normalized tags and
periods. Note: Because we already identified the most current year in A2,
we’ll use a formula to find the three most recent years.
A B C D
6 = $B2 = B6-1 = C6-1
7 [Revenue]
8 [Cost of Revenue]
9 [Gross Profit]
10 [Selling, General &
Administrative Expense]

434

ISTUDY
A B C D
11 [Research & Development
Expense]
12 [Other Operating Expenses]
13 [Operating Expenses]
14 [Operating Income]
15 [Earnings before Taxes]
16 [Income Taxes]
17 [Net Income]

3. Now enter the = GETPIVOT() formula to pull in the correct values, using
relative or absolute references (e.g., $A7, $B$1, etc.) as necessary. For example,
the formula in B7 should be = GETPIVOTDATA(“Value”,PivotFacts!$A$3,
“Ticker”, $B$1,“Year”, B$6, “Account”, $A7)/$B$4.
4. If you’ve used relative references correctly, you can either drag the formula
down and across columns B, C, and D, or copy and paste the cell containing
the formula (don’t select the formula itself) into the rest of the table.
5. Use the formatting tools to clean up your spreadsheet.
6. Take a screenshot of your table (label it 8-1MB).
7. Next, you can begin editing your dynamic data and expanding your analysis,
identifying trends and ratios.
8. In your worksheet, add a sparkline to show the change in income statement
accounts over the three-year period:
A. In cell E7, go to Insert > Sparklines > Line in the ribbon.
B. Data Range: B7:D7
C. Location Range: E7
D. Click OK to add the line.
E. Next, copy the sparkline down the column. Note: The line is trending toward
the left as accounting reports typically show the most recent results on
the left.
9. Now perform a vertical analysis in the columns to the right showing each
value as a percentage of revenue:
a. Copy cells B6:D6 into F6:H6.
b. In F7, type = B7/B$7.
c. Drag the formula to fill in F7:H18.
d. Format the numbers as a percentage with one decimal place.
e. Add a sparkline in Column I.
10. Take a screenshot of your income statement (label it 8-1MC).
11. Now that you have a common size income statement, replace the company
ticker in cell B1 with any SP 100 company’s ticker (e.g., DD, XOM, or PG)
and press Enter. The data on the spreadsheet will update.
12. Using the formulas given above, answer the questions for this part. When you
are finished answering the lab questions, you may close Excel. Save your file
as Lab 8-1 SP100 Analysis.xlsx.

435

ISTUDY
Google | Sheets

1. In your Google Sheet, begin by entering the values, as shown:

A B
1 Company AAPL
2 Year 2020
3 Period Y
4 Scale 1000000

2. Then set up your financial statement using the following normalized tags and
periods. Note: Because we already identified the most current year in A2,
we’ll use a formula to find the three most recent years.

A B C D
6 = $B2 = B6-1 = C6-1
7 [Revenue]
8 [Cost of Revenue]
9 [Gross Profit]
10 [Selling, General & Administrative Expense]
11 [Research & Development Expense]
12 [Other Operating Expenses]
13 [Operating Expenses]
14 [Operating Income]
15 [Earnings before Taxes]
16 [Income Taxes]
17 [Net Income]

3. Now enter the = XBRLFact() formula to pull in the correct values, using rel-
ative or absolute references (e.g., $A7, $B$1, etc.) as necessary. For example,
the formula in B7 should be = XBRLFact($B$1,$A7,B$6,$B$3)/$B$4.
4. If you’ve used relative references correctly, you can either drag the formula
down and across columns B, C, and D, or copy and paste the cell (not the
formula itself) into the rest of the table.
5. Use the formatting tools to clean up your spreadsheet.
6. Take a screenshot of your table (label it 8-1GB).
7. Next, you can begin editing your dynamic data and expanding your analysis,
identifying trends and ratios.
8. In your Google Sheet, use a sparkline to show the change in income state-
ment accounts:
a. In cell E7, type: = SPARKLINE(B7:D7). Next, copy the sparkline down
the column. Note: The line is trending toward the left.
9. Now perform a vertical analysis in the columns to the right showing each
value as a percentage of revenue:
a. Copy cells B6:D6 into F6:H6.
b. In F7, type = B7/B$7.

436

ISTUDY
c. Drag the formula to fill in F7:H18.
d. Format the numbers as a percentage.
e. Add a sparkline in Column I.
10. Take a screenshot of your income statement (label it 8-1GC).
11. Now that you have a common size income statement, replace the company
ticker in cell B1 with your selected company’s ticker and press Enter. The
data on the spreadsheet will update.
12. Using the formulas given above, answer the questions for this part. When you
are finished answering the lab questions, you may close Google Sheets. Your
work will be saved automatically.

Lab 8-1 Part 2 Objective Questions (LO 8-1, 8-2)


OQ1. What direction is cost of revenue trending for GE?
OQ2. In which of the three years analyzed is operating expense highest as a percent-
age of revenue for PG?
OQ3. If analysts expect MMM to attain a gross profit margin above 47 percent, does
the company exceed the expectations in all three years analyzed?

Lab 8-1 Part 2 Analysis Questions (LO 8-1, 8-2)


AQ1. Enter the ticker for Apple Inc. (AAPL) or Nike (NKE). Look at the trends and
composition of the income statement, then answer your three questions from
Part 1 AQ2.
AQ2. How did the actual results compare with your hypothesis in Part 1 AQ3?
AQ3. Replace the company ticker with a competitor of your company (e.g., MSFT vs.
AAPL). How do their trends compare with your initial company?
AQ4. How could you expand this table you built in Part 2 to include multiple com-
petitors’ data on the same sheet for quick analysis?

Lab 8-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 8-2 C
 reate Dynamic Common Size Financial
Statements—S&P100
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: This lab will pull in XBRL data from S&P100 companies listed with the
Securities and Exchange Commission. This lab will have you attempt to identify companies
based on their financial ratios.
Data: Lab 8-2 SP100 Facts Pivot.zip - 933KB Zip / 950KB Excel

437

ISTUDY
Lab 8-2 Example Output
By the end of this lab, you will create a dashboard that will let you explore common size
financial statement figures from annual reports for S&P100 companies. While your results
will include different data values, your work should look similar to this:

Microsoft | Excel & Google | Sheets

Microsoft Excel

LAB 8-2M Example of Dynamic Common Size Financial Statements in Microsoft Excel
and Google Sheets

Lab 8-2 Identify the Mystery Companies


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 8-2 [Your name] [Your email address].docx.
XBRLAnalyst and PivotTables allow you to easily create common size financial statements.
Using the skills learned in Lab 8-1, now extend the analysis to identify some companies
based on their financial performance. The S&P100 companies listed in Lab Exhibit 8-2A
operate in a variety of industries. Their FY2020 revenue and assets appear below:

Revenue (millions) Assets (millions)


Company FY2020 FY2020
LAB EXHIBIT 8-2A Accenture PLC (ACN) is a multinational company based in Ireland $44,327 $37,079
Background Infor- that provides consulting and processing services.
mation on Selected Amazon (AMZN) is an American multinational technology com- $386,064 $321,195
S&P100 Companies pany that operates as an online retailer and data services provider in
North America and internationally.
Boeing (BA) engages in the design, development, manufacture, sale, $58,158 $152,136
and support of commercial jetliners, military aircraft, satellites, mis-
sile defense, human space flight, and launch systems and services
worldwide.
Cisco (CSCO) designs, manufactures, and sells Internet protocol (IP)– $49,301 $94,853
based networking and other products related to the communications
and information technology industries worldwide.
Coca-Cola (KO) is a beverage company engaging in the manufacture, $33,014 $87,296
marketing, and sale of nonalcoholic beverages worldwide.
Mondelez (MDLZ) produces consumer food products, such as Oreo $26,581 $67,810
cookies.

438

ISTUDY
Revenue (millions) Assets (millions)
Company FY2020 FY2020
3M (MMM) is a multinational conglomerate that specializes in $32,184 $47,344
consumer goods and health care.
Merck (MRK) provides various health solutions through its pre- $47,994 $91,588
scription medicines, vaccines, biologic therapies, animal health, and
consumer care products worldwide.
Visa (V) is a financial services firm providing a payment processing $21,846 $80,919
network.
Walmart (WMT) operates retail stores in various formats worldwide. $523,964 $236,495
The company operates in three segments: Walmart U.S., Walmart
International, and Sam’s Club.

In Lab Exhibit 8-2B, you’ll find the common size ratios for each Lab Exhibit 8-2A company’s
income statement (as a percentage of revenue) and balance sheet (as a percentage of assets).
LAB EXHIBIT 8-2B
Mystery Ratios for Common Size Income Statement and Balance Sheet
A B C D E F G H I J
As a Percentage of Sales
Revenue 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Cost of Goods Sold 40.7 51.6 17.3 60.4 109.8 68.5 75.3 60.7 35.7 32.3
Gross Profit 59.3 48.4 82.7 39.6 9.8 31.5 24.7 39.3 64.3 67.7
Selling, General, and Administrative Expenses 29.5 21.5 9.5 7.4 8.3 16.8 20.8 22.9 22.5 21.8
Research and Development 0.0 5.8 0.0 0.0 4.3 0.0 0.0 0.0 12.9 28.2
Other Operating Expenses 2.6 50.4 22.6 86.6 0.3 68.5 0.0 1.1 1.0 1.2
Total Operating Expenses 32.1 77.7 35.5 94.1 12.2 85.3 20.8 24.8 36.6 51.3
Operating Income/Loss 27.3 22.3 64.5 5.9 22.0 14.7 3.9 14.5 27.6 16.5
Interest Expense 4.4 0.0 2.4 0.4 3.7 0.1 0.4 0.0 1.2 0.0
Income before Tax 29.5 20.9 63.1 6.3 24.9 15.3 3.8 12.7 28.3 18.3
Income Tax Expense 6.0 4.1 13.4 0.7 4.4 3.6 0.9 4.6 5.6 3.6
Net Income 23.5 16.7 49.7 5.5 20.4 12.3 2.9 13.4 22.7 14.8

As a Percentage of Assets
Current Assets 22.0 31.6 34.2 41.3 80.0 47.9 26.1 14.7 45.9 30.3
Receivables 0.6 9.9 2.0 7.6 1.3 0.0 2.7 3.4 5.8 8.6
Inventory 3.7 9.0 0.0 7.4 53.7 0.0 18.8 3.9 1.4 6.9
Other Current Assets 9.9 3.0 12.0 13.2 24.9 25.2 0.7 2.1 26.4 6.0
Total Current Assets 22.0 31.6 34.2 41.3 80.0 47.9 26.1 14.7 45.9 30.3
Long-Term Investments 22.1 0.0 0.3 0.0 0.7 0.9 0.0 8.9 6.0 0.9
Property, Plant, and Equipment 12.3 19.9 3.4 31.1 7.8 4.2 44.5 13.3 2.6 19.6
Goodwill 20.1 29.2 19.7 4.7 5.3 20.8 13.1 32.3 35.6 22.1
Intangible Assets 11.9 12.3 34.4 0.0 1.9 0.0 0.0 27.3 1.7 15.9
Total Long-Term Assets 78.0 68.4 65.8 58.7 20.0 52.1 73.9 85.3 54.1 69.7
Total Assets 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Liabilities 75.6 72.7 55.3 70.9 111.9 52.8 65.5 59.2 60.0 72.3
Current Liabilities 16.7 16.8 17.9 39.3 57.4 34.2 32.9 22.4 26.7 29.8
Accounts Payable 12.8 7.6 3.5 36.3 23.1 20.4 29.4 13.5 6.5 8.6
Total Non-Current Liabilities 58.9 55.9 37.3 15.2 54.5 18.7 27.1 36.9 33.3 42.4
Total Liabilities 75.6 72.7 55.3 70.9 111.9 52.8 65.5 59.2 60.0 72.3
Total Liabilities and Stockholders’ Equity 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

439

ISTUDY
Rev. Confirming Pages

Microsoft | Excel

1. Open Lab 8-2 SP100 Facts Pivot.xlsx in Excel.


2. Create a new sheet and rename it Mystery.
3. Use the = GETPIVOTDATA() formula as well as the normalized account
names found in Lab 8-1 Part 1 to recreate the table in Exhibit 8-2B with
company names from Exhibit 8-2A: = GETPIVOTDATA(“Value”, PivotFac
ts!$A$3,”Ticker”,$B$1,”Year”,B$6,”Account”,$A7)/$B$4 Hint: Use relative
references (e.g., B6/B$6) to save yourself some time. Fix the denominator
row to match the Revenue and Assets values.
4. Take a screenshot (label it 8-2MA).
5. Answer the lab questions, then you may exit Excel. Save your file as Lab 8-2
Mystery Ratios.xlsx.

Google | Sheets

1. Create a new Google Sheet called Lab 8-2 Mystery Ratios and connect it to
the iXBRLAnalyst add-in (or open the spreadsheet you created in Lab 8-1):
a. Click Extensions > Apps Script from the menu.
b. In a new browser window or tab, go to findynamics.com/gsheets/
ixbrlanalyst.gs (alternatively, open the ixbrlanalyst.txt file included in
the lab files).
c. Copy and paste the entire script from the FinDynamics page into the
Script Editor window, replacing any existing text.
d. Click Save and name the project XBRL, then click OK.
e. Close the Script Editor window and return to your Google Sheet.
f. Reload/refresh the page. If you see a new iXBRLAnalyst menu appear,
you are now connected to the XBRL data.
2. Use the = XBRLFACT() formula as well as the normalized accounts in Lab
8-1 to recreate the ratios above.
= XBRLFact($B$1,$A7,B$6,$B$3)/$B$4
Hint: Remember that the formula structure is = XBRLFact(company, tag, year,
period, member, scale). Use relative references (e.g., B6/B$6) to save yourself
some time. Fix the denominator row to match the Revenue and Assets values.
3. Take a screenshot (label it 8-2GA).
4. Answer the lab questions, then you may exit Google Sheets. Save your file as
Lab 8-2 Mystery Ratios.

Lab 8-2 Objective Questions (LO 8-1)


Match the company names in Lab Exhibit 8-2A with their corresponding ratios in each
column of Lab Exhibit 8-2B.
OQ1. Which company’s ratios match Column A?
OQ2. Which company’s ratios match Column B?

440

ISTUDY ric44907_ch08_404-453.indd 440 06/22/22 10:17 AM


OQ3. Which company’s ratios match Column C?
OQ4. Which company’s ratios match Column D?
OQ5. Which company’s ratios match Column E?
OQ6. Which company’s ratios match Column F?
OQ7. Which company’s ratios match Column G?
OQ8. Which company’s ratios match Column H?
OQ9. Which company’s ratios match Column I?
OQ10. Which company’s ratios match Column J?

Lab 8-2 Analysis Questions (LO 8-1)


AQ1. Look at the different companies’ ratios. Which ones stand out to you as being
atypical?
AQ2. Look at the values that return a 0. Is it likely that the 0 represents the com-
pany’s actual standing? What else might explain the 0 value?

Lab 8-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 8-3 Analyze Financial Statement Ratios—S&P100

Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Financial analysts, investors, lenders, auditors, and many others perform
ratio analysis to help review and evaluate a company’s financial statements and financial
performance. This analysis allows the stakeholder to gain an understanding of the financial
health of the company and gives insights to allow more insightful and, hopefully, more effec-
tive decision making.
In this lab, you will access XBRL data to complete data analysis and generate finan-
cial ratios to compare the financial performance of several companies. Financial ratios can
more easily be calculated using spreadsheets and XBRL. For this lab, a template is provided
that contains the basic ratios. You will (1) select an industry to analyze, (2) create a copy
of a spreadsheet template, (3) input ticker symbols from three U.S. public companies, and
(4) calculate financial ratios and make observations about the state of the companies using
these financial ratios.
Data: Lab 8-3 SP100 Ratios.zip

Lab 8-3 Example Output


By the end of this lab, you will look up ratios in a prebuilt dashboard that will let you
explore financial ratios in annual reports for S&P100 companies. While your results will
include different data values, your work should look similar to this:

441

ISTUDY
Microsoft | Excel & Google | Sheets

Microsoft Excel

LAB 8-3M Example of Financial Ratio Analysis in Microsoft Excel and Google Sheets

Lab 8-3 Load and Evaluate Financial Ratios


Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 8-3 [Your name] [Your email address].docx.
To master the data and prepare for analysis, pick which industry and which companies
to analyze.
Lab Exhibit 8-3A contains a list of 15 S&P100 companies in five different industries.
Each of these companies has attributes and strategies that are similar to and different from
its competitors. Refer to for your industry’s ticker symbols.

LAB EXHIBIT 8-3A Industry Company 1 Company 2 Company 3


Input Ticker Symbols Retail Walmart (WMT) Target (TGT) Costco (COST)
Technology Microsoft (MSFT) Apple Inc (AAPL) Facebook (FB)
Pharmaceutical Johnson & Johnson (JNJ) Merck (MRK) Bristol-Myers (BMY)
Finance Citigroup (C) Wells-Fargo (WFC) JPMorgan Chase (JPM)
Energy ExxonMobil (XOM) Chevron (CVX) ConocoPhillips (COP)

Microsoft | Excel

1. Open Lab 8-3 Financial Ratios with XBRL.xlsx in Excel.


2. Pick an industry and refer to Lab Exhibit 8-3A for your companies’ ticker
symbols. Enter the following into the sheet labelled Facts:
a. Main Company Ticker: Company 1’s ticker
b. Most Recent Year: 2020

442

ISTUDY
c. Period: FY for a fiscal year
d. Round to: 1000000 will round to millions of dollars.
e. Comparable 1 Ticker: Company 2’s ticker
f. Comparable 2 Ticker: Company 3’s ticker
3. Take a screenshot (label it 8-3MA) of your figure with the financial state-
ments of your chosen companies.
4. Review the Facts sheet (or tab) to determine whether there are any values
missing for the companies you are analyzing. Click through the sheets at the
bottom to review the various ratios. To aid in this analysis, the template also
includes sparklines that provide a mini-graph to help you quickly visualize
any significant values or trends.
5. Take a screenshot (label it 8-3MB) of the DuPont ratios worksheet.
6. Answer the lab questions and then close Excel. Save your work as Lab 8-3
Financial Ratios with XBRL Analysis.xlsx.

Google | Sheets

1. Create a copy of a spreadsheet template in the following way:


a. Open a web browser and go to drive.google.com.
b. If you haven’t done so already, sign in to your Google account.
c. Go to http://tinyurl.com/xbrlratios.
d. Click File > Make a copy . . .
e. Rename your spreadsheet Lab 8-3 Financial Ratios with XBRL Analysis
and click OK to save a copy to your Drive. A new tab will open with your
copy of the spreadsheet. You may now edit the values and formulas.
2. Pick an industry and refer to Lab Exhibit 8-3B for your companies’ ticker symbols.
Enter the following into the sheet: Note: In a moment, the value on the spread-
sheet will change to Loading . . . and then show your company’s financial figures.
a. Main Company Ticker: Company 1’s ticker
b. Most Recent Year: 2020
c. Period: FY for a fiscal year
d. Round to: 1000000 will round to millions of dollars.
e. Comparable 1 Ticker: Company 2’s ticker
f. Comparable 2 Ticker: Company 3’s ticker
3. Take a screenshot (label it 8-3GA) of your figure with the financial state-
ments of your chosen companies.
4. Review the Facts sheet (or tab) to determine whether there are any values
missing for the companies you are analyzing. Click through the sheets at the
bottom to review the various ratios. To aid in this analysis, the template also
includes sparklines that provide a mini-graph to help you quickly visualize
any significant values or trends.
5. Take a screenshot (label it 8-3GB) of the DuPont ratios worksheet.
6. Answer the lab questions and then close Google Sheets. Your edits are saved
automatically.

443

ISTUDY
Lab 8-3 Objective Questions (LO 8-1)
OQ1. Which industry did you analyze?
OQ2. For your Company 1, which ratio has seen the biggest change from 2018 to
2020? Use sparklines or calculate the percentage change.
OQ3. Which of the three companies is most liquid in 2020, according to the quick
ratio?
OQ4. How well has your Company 1 managed short-term liabilities over the last
three years?

Lab 8-3 Analysis Questions (LO 8-1)


AQ1. How does XBRL fulfill the need for real-time, accurate financial data?
AQ2. Why is it useful to compare multiple companies at once?
AQ3. Analyze liquidity, profitability, financing (leverage), and activity for your com-
pany. Where is it strong?
AQ4. What impact (if any) does missing data have on the ratios?
AQ5. If one company has a significantly higher debt-to-equity ratio than the other two,
what might be driving this? How might the DuPont ratios help explain this?

Lab 8-3 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 8-4 Analyze Financial Sentiment—S&P100


Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: As an analyst, you are now responsible for understanding what role
textual data plays in the performance of a company. A colleague of yours has scraped the fil-
ings off of the SEC’s website and calculated the frequency distribution of positive and nega-
tive words among other dimensions using the financial sentiment dictionary classified by
Loughran and McDonald. Your goal is to create a dashboard that will allow you to explore
the trend of sentiment over time.
Data: Lab 8-4 SP100 Sentiment.zip - 111KB Zip / 114KB Excel

Lab 8-4 Example Output


By the end of this lab, you will create a dashboard that will let you explore financial senti-
ment in annual reports for S&P100 companies. While your results will include different
data values, your work should look similar to this:

444

ISTUDY
Microsoft | Power BI Desktop

Microsoft Power BI Desktop

LAB 8-4M Example of Sentiment Analysis Dashboard in Microsoft Power BI Desktop

Tableau | Desktop

Tableau Software, Inc. All rights reserved.

Lab 8-4T Example of Sentiment Analysis Dashboard in Tableau Desktop

445

ISTUDY
Lab 8-4 Part 1 Filters and Calculations
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 8-4 [Your name] [Your email address].docx.
The dashboard you will prepare in this lab requires a little preparation so you can define
some calculations and key performance indicator targets and benchmarks for your evalua-
tion. Once you have these in place, you can create your visualizations. There are a number
of sentiment measures based on the Loughran-McDonald financial statement dictionary
(you will use the first three in this analysis). Each of these represents the count or number
of times a word from each dictionary appears in the financial statements:
• Lm.Dictionary.Count: Words that appear in the LM Dictionary, excluding filler words
and articles.
• Lm.Negative.Count: Words with negative financial sentiment, such as loss or impair-
ment. A high rate of negative words is associated with conservative reporting or nega-
tive news.
• Lm.Positive.Count: Words with positive financial sentiment. such as gain or asset. A
high rate of positive words is associated with optimistic reports.
• Lm.Litigious.Count: The total number of words with legal connotation, such as settle-
ment or jurisdiction. A high rate of litigious words indicated legal entanglement.
• Lm.Weak.Modal.Count: Words that indicate vacillation, such as might, possibly, or
could. A high rate of weak modal words is associated with volatility.
• Lm.Moderate.Modal.Count: Words such as probable or should. A high rate of moderate
modal words may reflect intention.
• Lm.Strong.Modal.Count: Words such as always or never. A high rate of strong modal
words is associated with certainty.
• Lm.Uncertainty.Count: Words such as doubt or unclear. A high rate of uncertain words
may reflect nebulous market or company conditions.
As an alternative, the Hv.Negative.Count value uses the Harvard General Inquirer off-the-
shelf dictionary that doesn’t take into account financial documentation and sentiment and
is useful for comparison.

Microsoft | Power BI Desktop

1. Open Power BI Desktop and connect to your data:


a. Click Home> Get Data > Excel.
b. Browse to find the Lab 8-4 SP100 Sentiment.xlsx file and click Open.
c. Check SP100 Sentiment and click Load.
2. Rename Page 1 to SP100 Sentiment.
3. Create columns and measures for the following items that you will need for
your evaluation in your graphs and tables. The new Negative column makes
the count of negative words a negative value for visualization purposes. Click
Modeling > New Column, then enter the formula below as a new column:
a. Negative = -’SP100 Sentiment’[lm.negative.count]
4. The Pos Neg Ratio (PNR) calculates the number of positive words to
negative words. In Power BI, this measure uses the SUM() function in the
numerator and denominator to calculate the average PNR across different

446

ISTUDY
aggregates, such as year or company. Click Modeling > New Measure, then
enter the formula below as a new measure:
a. Pos Neg Ratio = SUM(‘SP100 Sentiment’[lm.positive.count])/
SUM(‘SP100 Sentiment’[lm.negative.count])
5. To enable KPI targets to use as benchmarks for comparison, add the following
new measure. Companies with the most positive to negative words have a PNR
above 0.7. Click Modeling > New Measure, then enter the formula below as a
new column. You can always edit this value later to change the benchmarks:
a. KPI Most Positive = 0.7
6. To enable the use of color on your graphs to show KPIs that exceed the tar-
get, create some additional measure based on IF. . .THEN. . . logic. Power
BI will apply conditional formatting with that color. Click Modeling > New
Measure, then enter the formula below as a new measure:
a. Most Positive = IF([Pos Neg Ratio]<[KPI Most
Positive],”#f28e2b”,”#4e79a7”)
7. Scroll your field list to show your new calculated values and take a screen-
shot (label it 8-4MA). Note: Your report should still be blank at this point.
8. Save your file as Lab 8-4 SP100 Sentiment Analysis.pbix. Answer the ques-
tions for this part and then continue to next part.

Tableau | Desktop

1. Open Tableau Desktop and connect to your data:


a. Click Connect to Data > Microsoft Excel.
b. Browse to find the Lab 8-4 SP100 Sentiment.xlsx file and click Open.
2. Rename Sheet 1 to Proportion of Positive and Negative Words.
3. Create calculated fields for the following items that you will need for your
evaluation in your graphs and tables. Negative makes the count of negative
words a negative value for visualization purposes. The Pos Neg Ratio (PNR)
calculates the number of positive words to negative words. By using SUM()
in the numerator and denominator, any aggregate value will show the aver-
age PNR value. The Rank identifies the top salespeople based on sales. Click
Analysis > Create Calculated Field, then enter the name from the left side of
the = and the formula from the right side. Note: Do not include the equal sign.
a. Negative = -[Lm.Negative.Count]
b. Pos Neg Ratio = SUM([Lm.Positive.Count])/SUM([Lm.Negative.Count])
4. To enable KPI targets to use as benchmarks for comparison, add the follow-
ing measure. Click the down arrow at the top of the data tab and choose
Create Parameter. Enter the name and the current value below and click
OK. Companies with the most positive to negative words have a PNR above
0.7. You can always edit these measures later to change the benchmarks:
a. Name: KPI Most Positive
b. Data type: Float

447

ISTUDY
c. Current value: 0.7
d. Value when workbook opens: Current value
e. Display format: Automatic
f. Allowable values: All
g. Click OK.
5. To enable the use of color on your graphs to show positive and negative com-
panies in orange and blue, respectively, create the following calculated field
then click OK.
a. Most Positive = [Pos Neg Ratio]<[KPI Most Positive]
6. Scroll your field list to show your new calculated values and take a screen-
shot (label it 8-4TA). Note: Your report should still be blank at this point.
7. Save your file as Lab 8-4 SP100 Sentiment Analysis.twb. Answer the ques-
tions for this part and then continue to next part.

Lab 8-4 Part 1 Objective Questions (LO 8-2, 8-3)


OQ1. What is the benchmark value for PNR used to identify companies with the
most positive word counts?
OQ2. What does the PNR represent?

Lab 8-4 Part 1 Analysis Questions (LO 8-2, 8-3)


AQ1. Given the benchmark for PNR, do you expect financial statements to include
more positive or negative words overall? Why?
AQ2. Look at the companies with the highest word counts. Do they tend to have a
higher or lower PNR? Would you expect companies that with high word counts
in their financial statements to include more positive words or more negative
words?

Lab 8-4 Part 2 Sentiment Analysis Dashboard


Now you can begin putting together your dashboard to enable analysts to evaluate senti-
ment for listed companies to make a meaningful comparison. This means you should use a
filter or slicer to enable quick selection of a given exchange, period, or company.

Microsoft | Power BI Desktop

1. Open your Lab 8-4 SP100 Sentiment Analysis.pbix file created in Part 1 and
go to the SP100 Sentiment tab.
2. Add a new Slicer to your page and resize it so it fits the top-right corner of
the page.
a. Field: Exchange
b. Filter: NYSE

448

ISTUDY
Rev. Confirming Pages

3. Add a new Slicer to your page and resize it so it fits just below your first slicer.
a. Field: date.filed > Date Hierarchy > Year
b. Filter: 2016 to 2019
4. Add a new Slicer to your page and resize it so it fits the bottom-right corner
of the page.
a. Field: company.name > Right-click and choose Rename for this visual and
rename SP100 Company.
b. Filter: all
5. Click on the blank part of the page and add a new Line and Stacked Col-
umn Chart to the page to show the proportion of positive to negative words.
Resize it so that it fills the top-left quarter of the remaining space on the page.
a. Drag the following fields to their respective field boxes:
1. X-axis: date.filed
2. Column Y-axis: lm.positive.count > Rename to Positive
3. Column Y-axis: Negative
4. Line Y-axis: Pos Neg Ratio > Rename to Positive-to-Negative Ratio
b. Click the Format visual (paintbrush) icon to add a title to your chart:
1. General > Title > Title text: Proportion of Positive and Negative Words
6. Click on the blank part of the page and add a new Table. Resize it so that it
fills the top-right quarter of the remaining space on the page.
a. Columns:
1. date.filed > Date Hierarchy > Year
2. word.count > Rename to Word Count
3. lm.positive.count > Rename to Positive WC
4. lm.negative.count > Rename to Negative WC
5. Pos Neg Ratio > Rename to PN Ratio
b. Click the Format visual (paintbrush) icon to add backgrounds to the data
values:
1. Go to Visual > Cell elements > Apply settings to and click Conditional
formatting (fx) below data bars. Use the Series field drop-downs to en-
able data bars for Word Count, Positive WC, Negative WC, and PN Ratio.
c. Take a screenshot (label it 8-4MB) of your first two visuals.
7. Click on the blank part of the page and add a new Stacked Column Chart.
Resize it so that it fills the bottom-left third of the remaining space on the page.
a. X-axis: company.name
b. Y-axis: Pos Neg Ratio
c. Click the Format visual (paintbrush) icon:
1. Visual > Y-axis > Title: PN Ratio
2. Visual > X-axis > Title: Company
3. Visual > Columns > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. Based on field: Most Positive
4. Title > Title text: Most Positive Companies

449

ISTUDY ric44907_ch08_404-453.indd 449 02/20/23 01:43 PM


Rev. Confirming Pages

8. Click on the blank part of the page and add a new Stacked Column Chart. Resize
it so that it fills the bottom-middle third of the remaining space on the page.
a. X-axis: company.name
b. Y-axis: word.count > Average
c. Click the Format visual (paintbrush) icon:
1. Visual > Y-axis > Title: Average Word Count
2. Visual > X-axis > Title: Company
3. Visual > Columns > Colors > Conditional formatting (fx button),
enter the following, and click OK:
a. Format style: Field value
b. What field should we base this on?: Most Positive
4. General > Title > Text: Companies with the Most to Say
9. Click on the blank part of the page and add a new Scatter Chart. Resize it so
that it fills the bottom-right third of the remaining space on the page.
a. Drag the following fields to their respective boxes:
1. Values: company.name
2. X-axis: word.count > Average
3. Y-axis: Pos Neg Ratio
b. Click the Analytics magifying glass icon to add a trend line to your scatter
chart:
1. Trend line > On > Combine Series > Off
c. Click the Format visual (paintbrush) icon to clean up your chart and add
color to show whether the KPI benchmark has been met or not:
1. Visual > X-axis > Title: Average Word Count
2. Visual > Y-axis > Title: PN Ratio
3. General > Title > Text: PNR by Avg Word Count
d. Click the more options (three dots) in the top-right corner of the scatter
chart and choose Automatically find clusters. Set the number of clusters
to 6 and click OK.
e. Take a screenshot (label it 8-4MC) of your completed dashboard.
10. When you are finished answering the lab questions, you may close Power BI
Desktop and save your file.

Tableau | Desktop

1. Open your Lab 8-4 SP100 Sentiment Analysis.twb file from Part 1. Then add
the following:
a. Drag the following fields to their respective shelves:
1. Columns: Date.Filed > Year
2. Rows:Measure Values
3. Measure Values: Remove all values except:

450

ISTUDY ric44907_ch08_404-453.indd 450 02/14/23 01:44 PM


a. Lm.Positive.Count > SUM
b. Negative > SUM
4. Marks: Change Measure Names to Color and select Bar from the
drop-down menu.
5. Rows: Pos Neg Ratio
6. Marks: Change Pos Neg Ratio to Line in the drop-down menu.
7. Right-click the Pos Neg Ratio vertical axis and choose Edit Axis, then
uncheck Include zero and close the window.
8. Right-click the Pos Neg Ratio vertical axis again and choose Dual Axis.
9. Filters: Drag each of the following to the Filter pane:
a. Exchange > NYSE > OK
i. Right-click the filter and choose Show Filter.
ii. Right-click the filter and choose Apply to Worksheets > All Using
This Data Source.
b. Date Filed > Years > Next > Check only 2016, 2017, 2018, 2019 > OK
i. Right-click the filter and choose Show Filter.
ii. Right-click the filter and choose Apply to Worksheets > All
Using This Data Source.
c. Company Name > All > OK
i. Right-click the filter and choose Show Filter.
ii. Right-click the filter and choose Apply to Worksheets >
All Using This Data Source.
b. Take a screenshot (label it 8-4TB) of your worksheet.
2. Create a new worksheet called Word Count and add the following:
a. Drag the following fields to their respective boxes:
1. Columns: Measure Names
2. Rows: Date Filed > Year
3. Marks > Text: Measure Values
4. Measure Values > Remove all values except the following. Hint: You
can click the drop-down menu for Measure Values and choose Edit
Filter to uncheck the other values as well.
a. Word.Count > Sum
b. Lm Negative Count > Sum
c. Lm Positive Count > Sum
d. Pos Neg Ratio
i. Right-click Pos Neg Ratio and choose Format.
ii. Change the number format to Percentage and close the format
pane.
3. Create a new worksheet called Most Positive Companies and add the following:
a. Columns: Company Name
b. Rows: Pos Neg Ratio
c. Marks > Color: Most Positive
d. Sort by Pos Neg Ratio descending.

451

ISTUDY
4. Create a new worksheet called Companies with the Most to Say and add the
following:
a. Columns: Company Name
b. Rows: Word Count > Average
c. Marks > Color: Most Positive
d. Sort by Avg. Word Count descending.
5. Create a new worksheet called PNR by Avg Word Count and add the following:
a. Columns: Word Count > Average
b. Rows: Pos Neg Ratio
c. Marks > Detail: Company Name
d. Analytics > Cluster: Number of Clusters: 6
e. Analytics > Trend Line > Linear
6. Finally, create a new dashboard tab called SP100 Sentiment and add your
charts from this part of the lab. Note: There will be two visuals on the top row
and three visuals on the bottom row.
a. In the Dashboard tab, change the size from Desktop Browser > Fixed
Size to Automatic.
b. Drag the Proportion of Positive and Negative Words sheet to the dash-
board. This will occupy the top-left corner of the dashboard.
c. Drag the Word Count sheet to the right side of the dashboard.
d. Drag the Most Positive Companies sheet to the bottom-left corner of the
dashboard.
e. Drag the Companies with the Most to Say sheet to the bottom-right cor-
ner of the dashboard.
f. Drag the PNR by Avg Word Count sheet to the bottom-right corner of the
dashboard.
g. In the top-right corner of each sheet on your new dashboard, click Use as
Filter (funnel icon) to connect the visualizations so you can drill down
into the data.
7. Take a screenshot (label it 8-4TC) of your completed dashboard.
8. When you are finished answering the lab questions, you may close Tableau
Desktop and save your file.

Lab 8-4 Part 2 Objective Questions (LO 8-2, 8-3)


OQ1. How many NYSE companies have a positive-negative ratio above 0.7 from
2016 to 2019?
OQ2. In what year does the positive-negative ratio reach its highest point?
OQ3. What is the value of the positive-negative ratio at its highest point?
OQ4. In the top 10 companies with the most to say, which companies have a
positive-negative ratio higher than 0.7?
OQ5. Click the filters to compare S&P100 companies listed on the NYSE to those listed
on NASDAQ. Which exchange has the most positive companies in the data?

452

ISTUDY
OQ6. According to the visual showing NYSE companies’ total word count, does
the majority of the companies with a high total word count appear to have a
positive-negative ratio higher or lower than the benchmark?

Lab 8-4 Part 2 Analysis Questions (LO 8-2, 8-3)


AQ1. Click through several of the companies to observe their word counts and
positive-negative ratios. What patterns to you notice?
AQ2. If you look at the scatter plot for all NYSE companies, do word count and
positive-negative ratio appear to be correlated? In other words, if a company has
more to say, does that mean it is more likely to say positive or negative things?
AQ3. What do you notice about the count of positive and negative words compared
to the total word count? How might this comparison impact your conclusions?
AQ4. How else might you want to evaluate or visualize financial sentiment?

Lab 8-4 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

453

ISTUDY
Chapter 9
Tax Analytics

A Look at This Chapter


We highlight the use of Data Analytics for the tax function. First, we consider how the IMPACT model is used to
evaluate tax questions. We next consider how tax data sources differ depending on the tax user, whether that be a tax
department, an accounting firm, or a regulatory body. Next, we investigate how visualizations are useful components
of tax analytics. Finally, we consider how data analysis might be used to assist in tax planning including what-if analy-
sis for new legislation, the possibility of a merger with another company, a shift in product mix, or a plan to set up
operations in a new low-tax jurisdiction (and/or transfer pricing).

A Look Back
In Chapter 8, we focused on how to access and analyze financial statement data. We highlighted the use of XBRL
to quickly and efficiently gain computer access to financial statement data. Next, we explained how ratios are used
to analyze financial performance. We also discussed the use of sparklines to help users visualize trends in the data.
Finally, we discussed the use of text mining to analyze the sentiment in financial reporting data.

A Look Forward
In Chapter 10, we together all of the accounting Data Analytics concepts with a set of exercises that walk all the way
through the IMPACT model. The chapter serves as a great way to combine all of the elements learned in the course.

454

ISTUDY
Knowing the tax liability for a move to a new jurisdiction is important for corpora-
tions and individuals alike. For example, a tax accountant might have advised
LeBron James not to sign with the Los Angeles Lakers in the summer of 2018
because it is expected it will cost him $21 million more in extra state income
taxes since California has higher taxes than Ohio.
Tax data analytics for this type of “what-if scenario analysis” could help frame
LeBron’s decision from a tax planning perspective. Such what-if scenario analysis
has wide application when contemplating new legislation, a merger possibility,
a shift in product mix, or a plan to set up operations in a new low-tax (or high-
tax) jurisdiction. Tesla recently applied tax planning concepts when considering
the tax incentives available for locating its Cybertruck factory. Tesla ultimately
decided to build its factory in Texas.

Sources: https://www.forbes.com/sites/seanpackard/2018/07/02/lebrons-move-
could-cost-him-21-million-in-extra-state-taxes/#6517d3156280 (accessed August 2,
2018); https://techcrunch.com/2020/07/14/tesla-lands-another-tax-break-to-locate-
Jim McIsaac/Contributor/Getty images cybertruck-factory-in-texas (accessed April 20, 2021).

OBJECTIVES
After reading this chapter, you should be able to:

LO 9-1 Understand the different types of problems addressed and analytics


performed in tax analytics.
LO 9-2 Describe the tax data sources available at companies, accounting firms,
and the IRS.
LO 9-3 Understand the use of visualizations as a tool in tax analytics.
LO 9-4 Understand the use of tax data for tax planning, and perform what-if
scenario analysis.

455

ISTUDY
456 Chapter 9 Tax Analytics

LO 9-1 TAX ANALYTICS


Understand the With more and more data available, just like other areas in accounting, there is an increased
different types focus on tax analytics. New regulations are requiring greater detail, and tax regulators are
of problems getting more adept at the use of analytics. In addition to the regulator side, tax filers now
addressed and have more data to support their tax calculations and perform tax planning.  
analytics performed We now explain some of the tax questions, sources of available tax data, potential analyt-
in tax analytics. ics applied, and means of communicating and tracking the results of the analysis using the
IMPACT model.

Identify the Questions


What are the tax questions that are potentially addressed using tax analytics? Here’s an
example of a few of the many possibilities:
• What is the amount of tax paid each year by entity (nationwide, corporate, and indi-
vidual) or tax category (income, sales, property, excise, etc.)?
• What is the difference between GAAP-based and taxable income (book-tax differences)?
• What is the amount of sales tax paid compared to expectations?
• What is the amount of R&D tax credit we expect to qualify for in the future?
• If certain tax legislation passes, what level of exposure (additional tax) might the
company face?
• What will be the amount of taxes we owe if we pursue a merger or acquisition?

Master the Data


Companies, accounting firms, and the IRS have access to many varieties of tax data. Com-
panies’ tax data typically originate in their financial reporting system. Accounting firms
maintain certain client data. The IRS and other regulatory agency data come from the data
they maintain and generate about each taxpayer from tax returns and other sources.

Perform Test Plan


Exhibit 9-1 provides examples of the potential tax questions addressable using Data Analytics
as well as Data Analytics techniques used by analytics type (descriptive, diagnostic, predictive,
or prescriptive analytics).
EXHIBIT 9-1 Analytics Type Potential Tax Questions Addressed Data Analytics Techniques Used
Tax Questions and
Data Analytics Descriptive—summarize activity How much in federal income taxes Summary statistics (Sums, Totals,
or master data based on certain did we pay last year? Averages, Medians, Bar Charts,
Techniques by Analytics
attributes to address questions What has been the level of taxable Histograms, etc.)  
Type
of this type: income and effective tax rates over Ratio analysis  
What happened? What is the past 5 years? Tax KPIs
happening? What is the level of tax carry-
forwards we have going forward?
What is the level of job satisfaction of
the tax personnel (tracked as a KPI)?
Diagnostic—detect correlations Why did taxable income go up, when Performance comparisons to past,
and patterns of interest and (financial) net income fell? competitor, industry, stock market,
compare them to a benchmark Why do we pay a higher effective tax overall economy.
to address questions of this type: rate than the industry average? Drill-down analytics to determine
Why did it happen? What are relations/patterns/linkages
the reasons for past results? Can between variables—regression and
we explain why it happened? correlation analysis.
(Continued)

ISTUDY
Chapter 9 Tax Analytics 457

Analytics Type Potential Tax Questions Addressed Data Analytics Techniques Used EXHIBIT 9-1
(Continued)
Predictive—identify common What is the expected taxable in- Sales, earnings, and cash-flow
attributes or patterns that may come over the next 5 years?   forecasting using:
be used to forecast similar How much tax personnel turnover • Time series
activity to address the following will we expect to have in the next few • Analyst forecasts
questions: years? • Competitor and industry
Will it happen in the future? performance
What is the probability
• Macroeconomic forecasts
something will happen? Is it
forecastable?
Prescriptive—recommend action Based on expected future income and Tax planning based on what-if
based on previously observed expected future tax rates, how can we analysis and what-if scenario
actions to address questions of minimize future taxes? analysis
this type: How can we take maximum Sensitivity analysis
What should we do based on advantage of R&D tax credits over
what we expect will happen? the next few years?
How do we optimize our
performance based on potential
constraints?

Descriptive Analytics
Descriptive tax analytics provide insight into the current processes, policies, and calcula-
tions related to determining tax liability. These analytics involve summarizing transactions
by jurisdiction or category to more accurately calculate tax liability. They also track how
well the tax function is performing using tax key performance indicators (KPIs). We pro-
vide an example of these KPIs later in the chapter.

Diagnostic Analytics
Diagnostic tax analytics might help identify items of interest, such as high tax areas or
excluded transactions. For example, creating a trend analysis for sales and use tax paid
in different locations would help identify seasonal patterns or abnormal transaction vol-
ume that warrant further investigation. Diagnostic analytics also look for differences from
expectations to help determine if adequate taxes are being paid. One way for tax regula-
tors to assess if companies are paying sufficient tax is to look at the differences between
the amount of income reported for financial reporting purposes (like Form 10-Q or 10-K
submitted to the SEC) and the amount reported to the IRS (or other tax authorities) for
income tax purposes. Increasingly, tax software and analytics (such as Hyperion or Corp-
tax) are used to help with the reconciliation to find both permanent and temporary differ-
ences between the two methods of computing income and also to provide needed support
for IRS Schedule M-3 (Form 1120).

Predictive Analytics
Predictive tax analytics use historical data and new information to identify future tax liabili-
ties and may be used to forecast future performance, such as the expected amount of taxes
to be paid over the next 5 years. On the basic level, this includes regression and what-if
analyses and requires a specific dependent variable (or target, such as the value of a tax
credit or deferred tax asset). The addition of ancillary data, including growth rates, trends,
and other identified patterns, aids to the usefulness of these analyses. Additionally, tax ana-
lytics rely on tax calculation logic and tax determination, such as proportional deductions,
to determine the potential tax liability.
Another example of predictive analytics in tax would be determining the amount of R&D
tax credit the company may qualify to take over the next 5 years. The R&D tax credit is a tax

ISTUDY
458 Chapter 9 Tax Analytics

credit under Internal Revenue Code section 41 for companies that incur research and devel-
opment (R&D) costs. To receive this credit, firms must document an appropriate level of
detail before receiving R&D tax credit. For example, companies have to link an employee’s
time directly to a research activity or to a specific project to qualify for the tax credit. Let’s
suppose that a firm spent money on qualifying R&D expenditures but simply did not keep
the sufficient detail needed as supporting evidence to receive the credit. Analytics could be
used to not only predict the amount of R&D tax credit it may qualify for, but also find the
needed detail (timesheets, calendars, project timelines, document meetings between various
employees, time needed for management review, etc.) to qualify for the R&D tax credit.

Prescriptive Analytics
Building upon predictive analytics, prescriptive analytics use the forecasts of future perfor-
mance to recommend actions that should be taken based on opportunities and risks facing
the company. Such prescriptive analytics may be helpful in tax planning to minimize the
payments of taxes paid by a company. If a trade war with China occurred, it is important to
assess the impact of this event on the company’s tax liability. Or if there is the possibility of
new tax legislation, prescriptive analytics might be used to justify whether lobbying efforts
or payments to industry coalitions are worth the cost. Tax planning may also be employed
to help a company determine the structure of a transaction, such as a merger or acquisition.
To do so, prescriptive analytics techniques such as what-if scenarios, goal-seek analysis, and
scenario analysis would all be useful in tax planning to lower potential tax liability.

Address and Refine Results


The models selected by the the IRS, taxpayers, or auditors will generate various results.
A sample selection may give auditors a list of high-risk transactions to evaluate their tax
treatment and compliance.

Communicate Insights and Track Outcomes


Dashboards are often useful to track tax KPIs to consistently communicate the most important
metrics for the tax function. We provide an example of these KPIs later in the chapter.

PROGRESS CHECK
1. Which types of analytics are useful in tax planning—descriptive, diagnostic,
predictive, or prescriptive—and why?
2. How can tax analytics support and potentially increase the amount of R&D tax
credit taken by a company?

LO 9-2 MASTERING THE DATA THROUGH TAX


Describe the tax DATA MANAGEMENT
data sources
Different tax entities have different data needs and different sources. We consider the tax
available at
companies,
data sources of the tax department, accounting firms, and tax regulatory bodies including
accounting firms, the IRS.
and the IRS.
Tax Data in the Tax Department
The tax department within a company typically uses data from the financial reporting sys-
tem (or enterprise system). However, the financial reporting system is primarily designed

ISTUDY
Chapter 9 Tax Analytics 459

and used for financial accounting purposes, where transactions that have an economic
impact are recorded as an input for the financial statements and other financial reporting
purposes. In addition, these financial reporting systems along with other data have also
been used for management accounting purposes to allow management to calculate the cost
of a product or to optimize a product mix that would maximize profits for the firm. There is
generally not a completely separate information system solely collecting tax data needed for
tax compliance and tax planning.
With little integration between the financial reporting system and the needs of the tax
function, tax departments would manually collect and extract data from their financial
reporting system and generalized data warehouse. After gathering data from these general-
ized data warehouses, tax departments would use Excel spreadsheets to capture and store
the detail needed to support tax calculations. Such lack of integration hampered efforts of
tax accountants to have the needed information to comply with tax law, to minimize current
taxes, and to allow for tax planning for future transactions.
With recent advances in technology, there are increasing opportunities for tax
departments to have greater control of their data, which allows them to work more
effectively and efficiently. Specifically, instead of using a generalized data warehouse,
enterprise systems increasingly use specific data marts for their tax function. Data
marts are defined as being a subset of the data warehouse oriented toward a specific
need. Such a tax data mart is used to extract past and real-time data from the financial
reporting system that are most applicable to the tax function. Tax departments are able
to specify which data might affect their tax calculations for their tax data mart and have
a continuous feed of those data. This tax data mart allows tax departments to more
completely “own” the data because no other group has the rights to modify them. They
can add to that tax data mart other relevant information that might come from other
sources.
They are also able to keep the tax data mart as a centralized repository so that different
users of the tax function can have access to the data. Exhibit 9-2 provides a good illustration
of how data are accumulated and subsequently dedicated for the tax function. Consistent
with the IMPACT model, tax data warehouses and tax data marts help tax departments to
“master the data” to address tax questions and issues inside the company.

EXHIBIT 9-2
Tax Data in a Data
Warehouse

ISTUDY
460 Chapter 9 Tax Analytics

Tax Data at Accounting Firms


Accounting firms also maintain tax data for their clients. How might they use these data
other than for compliance purposes? Accounting firms can also keep track of their clients
using another type of data mart; for example, a tax data mart kept at an accounting firm that
might have marketing implications. Let’s suppose an accounting firm has a tax data mart
that keeps track of clients and their unrealized capital gains. The Tax Cuts and Jobs Act of
2017 offers a major change to investors, allowing them to invest in opportunity zones (in
low-income communities) to defer or completely eliminate taxes on realized capital gains.
While only a fraction of the estimated total unrealized capital gains market of $6.1 trillion
actually qualifies for opportunity zones,1 there seems to be an almost endless set of inves-
tors that could reap tax savings via an opportunity zone. If a tax data mart allows account-
ing firms to know which investors have unrealized capital gains, they can effectively market
tax assistance, education about opportunity zones, or market investments in opportunity
funds to them directly.

Data Analytics at Work

In an Era of Automation and Computerization, What Role Do the


Accountants Play?

National Association of State Boards of Accountancy, Inc.

With so much data available, there is a need for accountants to “bring order” to the
data and add value by presenting the data “in a way that people trust it.”
Indeed, tax accountants use the available data to:
• Comply with tax law.
• Get every possible deduction to minimize tax cost.
• Perform tax planning to minimize taxes in the future.
They do this by acquiring, managing, storing, and analyzing the needed data to
perform these tasks.
Source: https://nasba.org/blog/2021/03/24/why-are-cpas-necessary-in-todays-world/ (accessed
April 20, 2021).

1
https://www.forbes.com/sites/jenniferpryce/2018/08/14/theres-a-6-trillion-opportunity-in-opportunity-
zones-heres-what-we-need-to-do-to-make-good-on-it/#527391d46ffc (accessed August 15, 2018).

ISTUDY
Chapter 9 Tax Analytics 461

Tax Data at the IRS


Tax data are also maintained and analyzed at the IRS (and other regulatory agencies). For
example, the IRS has a huge trove of data about each taxpayer. There are three main sources
of information, including the following:
1. Not only does the IRS have data of the reportable financial transactions that occur dur-
ing the year (including W-2s, Form 1099s, Schedule K-1s), but it also has a repository
of tax returns from prior years stored in a data warehouse.
2. The IRS mines and monitors personal data from social media (such as Facebook,
Twitter, Instagram, etc.)2 about taxpayers. For example, posts about a new car, new
house, or fancy vacation could help the IRS capture the taxpayer dodging or misreport-
ing income. Divorce lawyers certainly use the same tactics to learn the lifestyle and
related income of a divorcing spouse!
3. The IRS has personal financial data about each taxpayer, including Social Security
numbers, bank accounts, and property holdings. While most of these data are gathered
from prior returns and transactions (see number 1), the IRS can also access your credit
report during an audit or criminal investigation to determine if spending/credit looks
proportional to income and if it is trying to collect an assessment.
Each of these sources of information can help the IRS to establish a profile (using the
profiling test approach discussed in Chapter 3). The IRS has an algorithm called Discrimi-
nant Function that pulls historical data for average amount and type of deductions related
to income level and predicts the likelihood of underreported income. When the amount
self-reported by the taxpayer is significantly less than the amount estimated, additional
investigation and a potential tax audit might be warranted if the potential tax revenue is
greater than the expected cost of the investigation.

PROGRESS CHECK
3. Why do tax departments need to extract data for tax calculation from a financial
reporting system?
4. How is a tax data mart specifically able to target the needs of the tax department?

TAX DATA ANALYTICS VISUALIZATIONS LO 9-3

Tax Data Analytics Visualizations and Tax Compliance Understand


the use of
Increasingly, tax regulators are using Data Analytics to evaluate tax compliance by those visualizations
with potential tax liability. Tax regulators use Data Analytics to see if companies are close as a tool in tax
to actually paying what would be expected based on tax rates and expected income or sales. analytics.
To date, companies have not engaged in the same level of Data Analytics. If for no other
reason, companies might engage in Data Analytics to avoid a tax audit. In some sense, this
allows companies to “see what the regulator is seeing.” The European Union, for example,
is way ahead of the United States on use of Data Analytics in a tax audit, both by the regula-
tor and the company hoping to not be audited.

2
https://washington.cbslocal.com/2014/04/16/report-irs-data-mining-facebook-twitter-instagram-and-
other-social-media-sites/ (accessed August 2018).

ISTUDY
462 Chapter 9 Tax Analytics

Evaluating Sales Tax Liability


Evaluating sales tax liability can quickly be complicated by customer sales returns where
sales taxes are returned to customers. That complexity is compounded by the differing tax
rates in each city, county, and state jurisdiction. In preparation for an audit, sales tax regu-
lators could ask for gross sales or net sales (gross sales less returns) by store and compute
the total taxes owed to see if it is close to the amount of sales taxes actually paid by the
company. Companies can run the same type of analysis to see where they stand to avoid an
audit or at least be prepared in the eventuality that a tax audit does occur.
With the recent Supreme Court South Dakota v. Wayfair decision and the tidal wave of
states passing legislation to copy the decision, collection of sales tax on every online pur-
chase (based on where the customer lives) is a serious compliance issue. Companies need to
put data collection processes in place to collect, summarize, and process this information so
that they can have functional compliance with the new laws in states where they sell online.
A dashboard (similar to that introduced in Chapter 4) is a type of visualization that
might be helpful for compliance with state sales tax. The comprehensive labs at the end of
this chapter provide an example of how Data Analytics might be used with respect to state
sales tax data. Companies pay sales tax regularly based on the amount of sales they collect.

Evaluating Income Tax Liability


Tax data analytics allow tax departments to view multiple years, periods, jurisdictions (state
or federal or international, etc.), and differing scenarios of data typically through use of a
dashboard. Dashboards allow tax departments to evaluate those jurisdictions where current
state income tax liabilities have departed most from the liability of prior years. This allows
tax departments to evaluate further why current year jurisdictional taxable income and tax
liability have changed from the past and address if any issues or irregularities occur.

TAX DATA ANALYTICS VISUALIZATIONS ALLOW A WAY TO MONITOR


AND TRACK KPIS
As noted in this text, a key output of rich data from tax analytics is the ability to create
visualizations. As noted in Chapter 7, tracking KPIs using visualizations is a good way to
easily see how well the company is performing. Such KPIs might be used to monitor
different aspects of the tax function.
In the article “Defining Success: What KPIs Are Driving the Tax Function Today,”3
PwC points out four general categories of tax-focused KPIs, including tax cost, tax
risk, tax efficiency and effectiveness, and tax sustainability. We list some KPIs that
might be used to measure performance in each of these areas:
Tax cost: The actual amount of tax paid. Example KPIs include:
• Effective tax rate (ETR).
• Cash taxes paid.
• Effect of loss carry-forwards.
• Expiration of tax credits.
• Tax adjustments in response to new tax legislation.
• Deferred taxes.

3
“Defining Success: What KPIs Are Driving the Tax Function Today,” PwC, September 2017, https://www.
pwc.com/gx/en/tax/publications/assets/pwc_tax_function_of_the_future_tax_function_KPI_sept17.pdf
(accessed August 14, 2018).

ISTUDY
Chapter 9 Tax Analytics 463

Tax risk: With increased regulator and stakeholder scrutiny, firms bear the financial
and reputational risk of misreporting or tax provision adjustments. Example KPIs
include:
• Frequency and magnitude of tax audit adjustments.
• Frequency of concerns pertaining to the organization’s tax position.
• Levels of late filing or error penalties and fines.
• Number of resubmitted tax returns due to errors.
Tax efficiency and effectiveness: This includes the efficiency and effectiveness of technology,
processes, and people in carrying out the tax function. Example KPIs include:
• Levels of technology/tax training.
• Amount of time spent on compliance versus strategic activities.
• Level of job satisfaction of the tax personnel.
• Employee turnover of the tax personnel.
• Improved operational efficiency.
Tax sustainability: Refers to the ability to sustain similar tax performance over time.
Example KPIs include:
• Number of company tax audits closed and significance of assessment over time.
• The effective tax rate (ETR) over time.
Tax permanent differences: Additionally, tax managers should track permanent differ-
ences between book and tax revenue and expenses to ensure compliance and dispute
overpayments of taxes. These include:
• Penalties and fines (excluded from taxable income).
• Meals and entertainment (100 percent books, 50 percent tax).
• Interest on municipal bonds (nontaxed income).
• Life insurance proceeds (nontaxed income).
• Dividends received deduction (taxed based on percentage of ownership).
• Excess depreciation.
These tax-focused KPIs appear on dashboards or cockpits, consistent with the “C”
(communicate insights) and the “T” (track outcomes) of the IMPACT model. Cock-
pits are similar to dashboards but are much narrower in scope and focus than a dash-
board. This focus allows the tax function to highlight potential high-impact or single
areas of concern like reconciliation. We also note that the tax sustainability KPIs,
in particular, measure performance over time and are consistent with the “T” (track
outcomes) of the IMPACT model.

PROGRESS CHECK
5. Why is ETR (effective tax rate) a good example of a tax cost KPI? Why is ETR over
time considered to be a good tax sustainability KPI?
6. Why would a company want to track the levels of late filing or error penalties as
a tax risk KPI?

ISTUDY
464 Chapter 9 Tax Analytics

LO 9-4 TAX DATA ANALYTICS FOR TAX PLANNING


Understand the Tax planning is the analysis of potential tax liability and formulation of a plan to reduce the
use of tax data for amount of taxes paid. It involves forecasting corporate activity and calculating the antici-
tax planning, and pated tax liabilities or benefits from operations in various jurisdictions. Tax analytics help
perform what-if organizations operate in a tax-efficient manner as much as possible by identifying opportu-
scenario analysis. nities to minimize the amount of current and future taxes paid as well as to recover tax over-
payment. Tax accountants can utilize the abundance of detailed transaction and metadata
(e.g., descriptions of data, such as categories) to filter and analyze the data, identify oppor-
tunities for tax savings, and plan. Tax savings and recovery are especially important because
they represent value-adding functions of tax accountants since every tax dollar saved goes
directly to the bottom line (e.g., net income after tax).
Changes in tax legislation, changes in ownership, expansion into new territories, and
transfer pricing for intercompany sales affect future tax liability. Beyond calculating tax
rates across multiple jurisdictions, tax planning involves identifying transactions and invest-
ments that are subject to deductions, credits, and other exclusions from income. Tax plan-
ning may involve the following questions:
• What will be the impact of a new tax rate on our tax liability?
• Are we minimizing our tax burden by tracking all eligible deductible expenses and trans-
actions that qualify for tax credits?
• What would be the impact of relocating our headquarters to a different city, state, or country?
• What is the tax exposure for owners in the case of a potential merger or significant
change in ownership?
• Do our transfer pricing contracts on certain products put us at higher risk of a tax audit
because they have abnormal margins?
• What monthly trends can we identify to help us avoid surprises?
• How are we addressing tax complexities resulting from online sales due to new sales tax
legislation?
• How would tax law changes affect our pension or profit-sharing plans and top employee
compensation packages (including stock options)?
• How would the use of independent contractors affect our payroll tax liabilities?
The answers to these questions come from analysis of current transaction data and a
collection of parameters that represent potential assumption changes. A combination of
descriptive and predictive analytics with visualizations provides guidance for decision mak-
ers in each of these cases.

What-If Scenarios
What-if scenario analysis tests the impact of various input data on an expected output. In
tax, this means the manipulation of inputs—such as multiple tax rates, a series of transac-
tions, and varying profit margins—to estimate the future outputs, including estimated book
income, cash taxes paid, and effective tax rates. These analyses attempt to optimize the
inputs to reach a desired goal, such as minimizing the effective tax rate or generating a port-
folio of possible outputs given the inputs. In these cases, we need to estimate the possible
inputs and outputs as well as determine the expected probabilities of those items.
For example, assume the Pennsylvania General Assembly is debating a reduction in
the statutory corporate income tax rate from 10 percent to either 8 percent or 7 percent
with a positive (+5 percent), neutral, or negative (−5 percent) change in corporate income.
A company with expected earnings before tax of $1,000,000 might see potential tax savings
shown in Exhibit 9-3.

ISTUDY
Chapter 9 Tax Analytics 465

Change in Taxable Income / Change in Tax Rate 10% 8% 7% EXHIBIT 9-3


Estimated Change
Positive change (+5%) 5,000 (16,000) (26,500) in Tax Burden under
Neutral change (+0%) 0 (20,000) (30,000) Different Income Tax
Negative change (−5%) (5,000) (24,000) (33,500) Proposals
Based on average earnings before tax of $1,000,000. Negative values represent tax savings.

By itself, this analysis may indicate the path to minimizing tax would be the lower tax
rate with negative growth. An estimate of the joint probabilities of each of the nine scenar-
ios determines the expected value of each, or the most likely impact of a change (as shown
in Exhibit 9-4) and the dollar impact of the expected change in value (in Exhibit 9-5). For
example, there is a 0.05 probability (as shown in Exhibit 9-4) that there will be +5 percent
change in taxable income but no change in tax rate. This would result in a $250 increase
in taxes (as shown in Exhibit 9-5). In this case, the total expected value of the proposed
decrease in taxes is $15,575, which is the sum of the individual expected values as shown
in Exhibit 9-5.

Change in Taxable Income / Change in Tax Rate 10% 8% 7% Sum EXHIBIT 9-4
Joint Probabilities of
Positive change (+5%) 0.05 0.10 0.10 0.25 Changes in Tax Rate
Neutral change (+0%) 0.20 0.20 0.10 0.50 and Change in Income
Negative change (−5%) 0.10 0.10 0.05 0.25
Sum 0.35 0.40 0.25 1.00

Change in Taxable Income / Change in Tax Rate 10% 8% 7% EXHIBIT 9-5


Expected Value of Each
Positive change (+5%) 250 (1,600) (2,650)
of the Scenarios
Neutral change (+0%) 0 (4,000) (3,000)
Negative change (−5%) (500) (2,400) (1,675)

The usefulness of the what-if analysis is that decision makers can see the possible impact
of changes in tax rates across multiple scenarios. This model relies heavily on assumptions
that drive each scenario, such as the initial earnings before tax, the expected change in
earnings, and the possible tax rates. Data Analytics helps refine initial assumptions (i.e.,
details guiding the scenarios) to strengthen decision makers’ confidence in the what-if mod-
els. The more analyzed data that are available to inform the assumptions of the model, the
more accurate the estimates and expected values can be. Here, data analysis of before-tax
income and other external factors can help determine more accurate probability estimates.
Likewise, an analysis of the legislative proceedings may help determine the likelihood of a
change.

What-If Scenarios for Potential Legislation,


Deductions, and Credits
Changes in the tax code complicate tax estimates and payments. Potential changes in legis-
lation are generally complex, involving identification of qualifying transactions, calculating
partial transaction amounts, analyzing groups of transactions, and determining the impact
of the change from current policy. Changes involve updating rules and decision aides, as
well as capturing previously ignored metadata (such as categories).
Just like scenario analysis involving changes to corporate tax rates, we examine another
scenario analysis with the use of R&D tax credits. For example, the United States allows

ISTUDY
466 Chapter 9 Tax Analytics

companies to take a research credit of up to 20 percent of qualified research expenditures


(QREs) used to develop new products exceeding a calculated base amount, limited by a
ceiling. The use of tax analytics to determine adjustments to the research credits requires a
correct determination of expenses related to qualifying research. If any part of the research
credit were to change—such as the percentage, base amount, or ceiling—companies would
need to anticipate the change and enact policies and reporting to calculate the new values.
Data Analytics helps refine the model by more accurately calculating current levels of
activity and estimating trends for most likely changes in the future. To determine the level
of research activity within a firm, the system designers would need to appropriately code
transactions and individuals that qualify for the research credit. Some of these inputs and
variables require compiled data that include:
• Qualified research activities.
• Wages, bonuses, and stock options for employees engaged in, supporting, or supervising
qualified research.
• Supplies used to conduct qualified research.
• Contract research expense paid for qualified research by a third party.
• Average gross receipts over a four-year period.
• Limits on research credit.
• Carry-forward credit balance.
If the metadata and tagging for qualified research activities are inaccurate or missing,
additional data ETL would be required.
Scenarios involving changes to the research credit would most likely include the follow-
ing variables:
• Fixed-base percentage.
• Ceiling for fixed-base percentage.
• Floor of current QREs.
• Credit percentage.
• Current and future levels of qualified research activity.
One current change to research and development expenses that provides an interesting
opportunity for analysis is the recent change in U.S. tax code. While companies currently
expense research and experimental (R&E) expenditures in the year they are incurred, as
of December 31, 2021, the IRS will require companies to capitalize R&E expenditures
and amortize them over 5 years. This will result in an increase in taxable income and cor-
responding tax liability. What-if scenario analysis of current research activity and expected
activity through 2022 can provide insight into the amount of tax that is likely to be collected
after the change goes into effect.

PROGRESS CHECK
7. What are some data a tax manager would need in order to perform a what-if
analysis of the potential effects of a stock buyback?
8. How does having more metadata help a tax accountant minimize taxes?

ISTUDY
Summary
Recent advances in Data Analytics extend to tax functions, allowing them to work more
effectively, efficiently, and with greater control over the data.
■ Tax analytics allow tax departments, accounting firms, and regulators to address tax
opportunities and challenges. (LO 9-1)
■ Tax accountants use descriptive analytics to summarize performance, diagnostic analyt-
ics to compare with a benchmark and control costs, predictive analytics to predict tax
outcomes, and prescriptive analytics for tax planning purposes. (LO 9-1)
■ New regulations are requiring greater detail, and tax regulators are getting more adept in the
use of analytics, better able to address tax compliance and opportunities. In addition to the
regulator side, tax filers now have more data to support their tax calculations. (LO 9-2)
■ While the tax department has traditionally used data from the financial reporting system,
there are increasing opportunities to control and expand upon available tax data to help
address the most important tax questions. (LO 9-2)
■ Tax visualizations (dashboards, cockpits) can be helpful in monitoring how the tax function
is doing in meeting its KPIs (key performance indicators).  (LO 9-3)
■ Prescriptive analytics can be especially powerful in doing tax planning and formulating
what-if scenarios. (LO 9-4)

Key Words
Tax Cuts and Jobs Act of 2017 (460) Tax legislation offering a major change to the existing tax code.
data mart (459) A subset of the data warehouse focused on a specific function or department to assist
and support its needed data requirements.
data warehouse (459) A repository of data accumulated from internal and external data sources,
including financial data, to help management decision making.
tax data mart (459) A subset of a company-owned data warehouse focused on the specific needs of
the tax department.
tax planning (464) Predictive analysis of potential tax liability and the formulation of a plan to reduce
the amount of taxes paid.
what-if scenario analysis (464) Evaluation of the impact of different tax scenarios/alternatives on
various outcome measures including the amount of taxable income or tax paid.

ANSWERS TO PROGRESS CHECKS


1. Tax planning should be considered an integral part of prescriptive analytics where the com-
pany (or accounting firm) helps determine what decisions should be made to either mini-
mize tax liability in the future or reduce the uncertainty associated with future tax payments.
2. Analytics could be used to find the needed detail (timesheets, calendars, project time-
lines, document meetings between various employees, time needed for management
review, etc.) to qualify for the R&D tax credit.
3. Tax data marts are a repository of data from the financial reporting and other systems to
get the data to support tax department needs.
4. Tax departments are able to specify which data might affect their tax calculations for their
tax data mart and have a continuous feed of those data. This data mart is essentially one
where the tax department can, in some sense, “own” the data because no other group
has the rights to modify them.
5. The ETR (effective tax rate) is generally used as a measure of the tax cost used by the tax
department to understand how well they are keeping the tax cost at a minimum. The lower

467

ISTUDY
the effective tax rate, the more effective the tax department is at finding ways to structure
transactions to minimize taxes and find applicable tax deductions and tax credits (like the
R&D tax credit or other tax loopholes). Monitoring the level of the ETR over time helps us
know if the tax department is persistent and consistent in reducing the taxes paid, or if this
rate is highly variable. Generally, most tax professionals would consider the more stable
the ETR over time, the better. Tracking ETR over time as part of the tax sustainability KPIs
allows management and the tax department to figure out if the ETR is persistent or if the
rate bounces around each year in an unsustainable way.
6. The greater the number of levels of late filings or error penalties, the more vulnerable the
company is to penalties, tax audits, and missed tax saving opportunities.
7. Data may include the possible price of the stock, the potential capital gains incurred by
the stockholders, and the number of shares.
8. The more metadata, the better the tax accountants can accurately calculate the amounts
of taxable and nontaxable items. For example, they can more clearly identify expenses
that qualify for the research and development credit or track meal and entertainment
expenses that may trigger tax presence in other locations.

Multiple Choice Questions ®

1. (LO 9-1) In which stage of the IMPACT model (introduced in Chapter 1) would the use of
tax cockpits fit?
a. Track outcomes
b. Master the data
c. Address and refine results
d. Perform test plan
2. (LO 9-2) Tax departments interested in maintaining their own data are likely to have
their own:
a. tax reporting system.
b. tax data mart.
c. tax dashboard.
d. tax analytics.
3. (LO 9-3) According to the textbook, an example of a tax efficiency and effectiveness KPI
would be:
a. number of audits closed.
b. ETR (effective tax rate) over time.
c. number of resubmitted tax returns due to errors.
d. amount of time spent on compliance versus strategic activities.
4. (LO 9-3) According to the textbook, an example of a tax sustainability KPI would be:
a. frequency of concerns pertaining to the organization’s tax position.
b. level of job satisfaction of the tax personnel.
c. levels of technology/tax training.
d. number of audits closed and significance of assessment over time.
5. (LO 9-3) According to the textbook, an example of a tax cost KPI would be:
a. employee turnover of the tax personnel.
b. levels of technology/tax training.
c. ETR (effective tax rate).
d. levels of late filing or error penalties.

468

ISTUDY
6. (LO 9-4) The task of tax accountants and tax departments to minimize the amount of
taxes paid in the future is called:
a. tax planning.
b. tax compliance.
c. tax minimization.
d. tax sustainability.
7. (LO 9-3) According to the textbook, an example of a tax risk KPI would be:
a. employee turnover of the tax personnel.
b. levels of technology/tax training.
c. ETR (effective tax rate).
d. levels of late filing or error penalties.
8. (LO 9-3) What allows tax departments to view multiple years, periods, jurisdictions
(state or federal or international, etc.), and differing scenarios of data, typically through
use of a dashboard?
a. Tax data visualizations
b. Tax data warehouses
c. Tax compliance data
d. Tax planning
9. (LO 9-4) Predictive analysis of potential tax liability and the formulation of a plan to
reduce the amount of taxes paid is defined as:
a. tax data analytics.
b. tax data warehouses.
c. tax compliance data.
d. tax planning.
10. (LO 9-3) The evaluation of the impact of different tax scenarios/alternatives on various
outcome measures including the amount of taxable income or tax paid is called:
a. tax visualizations.
b. what-if scenario analysis.
c. tax compliance.
d. data warehousing.

Discussion and Analysis


1. (LO 9-1) Explain how the IRS might use social media data to profile taxpayers who
might be underpaying taxes. What additional information would the IRS need to con-
sider in addition to social media data to build a full taxpayer profile?
2. (LO 9-1, 9-4) Why would a company be interested in documenting the book-tax differ-
ences to identify potential items of interest to the IRS?
3. (LO 9-2) Explain why the needs of the tax accountant are different than the needs of the
financial accountants. Why does this lead to a tax data warehouse or tax data mart?
4. (LO 9-1) Why would tracking a client’s unrealized capital gains be important to busi-
nesses trying to capitalize on the tax opportunities inherent in opportunity zones (a new
investment opportunity available as a result of the Tax Cuts and Jobs Act of 2017)? How
would accounting firms access these data regarding their clients?
5. (LO 9-3) Why would employee turnover of the tax personnel be a good KPI to track a
company’s overall tax efficiency and effectiveness? What does low employee turnover
(as compared to high turnover) allow a tax department to do?
6. (LO 9-3) Explain why tax sustainability would be of interest to the tax department. What
does it allow them to do if they are able to gain tax sustainability?

469

ISTUDY
7. (LO 9-1) Descriptive analytics help calculate tax liability more accurately. Give some
examples of tax-related descriptive analytics.
8. (LO 9-1) Predictive analytics help identify future tax liabilities. What data would a tax
accountant need in order to perform a predictive analysis?
9. (LO 9-4) Explain how probability helps refine a what-if analysis.
10. (LO 9-3) How do visualizations of tax compliance assist a company in its efforts to
reduce tax risk and minimize the costs of tax preparation and compliance? In your opin-
ion, what would be needed to consistently make visualizations a key part of the tax
department evaluation of tax risk and tax cost minimization?

Problems
1. (LO 9-1) Match the description of the tax analysis question to the data analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Tax Analysis Question Data Analytics Type


1. What is the amount of property tax paid each year for the last 5 years)?
2. In order to minimize taxes, should we pursue a merger, joint venture,
or acquisition?
3. What is the amount of R&D tax credit we expect to qualify for in
the future?
4. What is the the amount of cash paid for taxes over the past 3 years?
5. What is the amount of income tax paid compared to the amount
the IRS expected the company to pay (based on profiling)?
6. How can we maximize the tax credits we qualify for in the future?

2. (LO 9-1) Match the description of the tax analytics technique to the data analytics type:
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

Tax Analytics Technique Data Analytics Type


1. Use of time series analysis to project future property taxes.
2. Past performance comparisons of effective tax rates to industry ef-
fective tax rates.
3. Use of analyst forecasts to project future sales.
4. Profiling to establish expectations of tax payment/liability by regulators.
5. Sums, averages, medians, bar charts.
6. Tracking performance using KPIs.
7. What-if scenario analysis.

3. (LO 9-3) Match the following tax KPIs to one of these areas:
• Tax cost
• Tax risk

470

ISTUDY
Tax KPI Tax Area
1. Effective tax rate (ETR).
2. Levels of late filing or error penalties and fines.
3. Cash taxes paid.
4. Expiration of tax credits.
5. Frequency of concerns pertaining to the organization’s tax position.
6. Number of resubmitted tax returns due to error.

4. (LO 9-3) Match the following tax KPIs to one of these areas:
• Tax sustainability
• Tax efficiency/effectiveness

Tax KPI Tax Area


1. Number of company tax audits closed and significance of assessment over time.
2. Levels of technology/tax training.
3. Employee turnover of the tax personnel.
4. The effective tax rate (ETR) over time.
5. Level of job satisfaction of the tax personnel.
6. Amount of time spent on compliance versus strategic activities.

5. (LO 9-3) Analysis: In your opinion, which of the four general categories of tax KPIs men-
tioned in the text would be most important to the CEO? Support your opinion.
6. (LO 9-4) Analysis: Assume that a company has the option of staying in a tax jurisdic-
tion with an effective tax rate of 20 percent or moving to a different location where the
effective tax rates are 11 percent. What other drivers besides the tax rate may affect the
decision to stay or move?
7. (LO 9-4) Analysis: If a company knows that the IRS will change a tax calculation in the
future, such as the capitalization of research and experimental expense in 2021, what
actions might management take today to reduce their tax liability when the new policy
goes into effect?
8. (LO 9-4) Analysis: How does tax planning differ from tax compliance? Why might the
company leadership be more excited about the value-creating efforts of tax planning
versus that of tax compliance?
9. (LO 9-2) Analysis: How does Data Analytics facilitate what-if scenario analysis? How
does the presence of a tax data mart help provide the needed data to support such
analysis?
10. (LO 9-2) Match the tax analytics definitions to their terms: data mart, data warehouse,
tax planning, tax data mart, what-if scenario analysis.

Tax Analytics Definition Tax Analytics Term


1. A subset of the data warehouse focused on a specific function or de-
partment to assist and support its needed data requirements.
2. A repository of data accumulated from internal and external sources,
including financial data, to help management decision making.
3. Predictive analysis of potential tax liability and the formulation of a plan
to reduce the amount of taxes paid.
4. A subset of a company-owned data warehouse focused on the specific
needs of the tax department.
5. Evaluation of the impact of different tax scenarios/alternatives on various
outcome measures including the amount of taxable income or tax paid.

471

ISTUDY
LABS ®

Lab 9-1 Descriptive Analytics: State Sales Tax Rates


Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Since taxes vary by state, this lab teaches how to gather the state sales
tax data, and how to analyze and visualize it.
Lab 9-1 data includes state sales tax rates from 2015. We are working with 2015 data
because the next labs build upon this lab with the Dillard’s historical transactions that took
place in 2015.
Data: Lab 9-1 State Sales Tax Rates.zip - 8KB Zip / 11 KB Excel

Lab 9-1 Example Output


By the end of this lab, you will create a histogram and a filled map demonstrating the distri-
bution of sales tax rates across states in the United States. While we blurred the pertinent
values in these screenshots, your work should look similar to this:

Microsoft | Excel

Microsoft Excel

LAB 9-1M Example of Visual Distributions of Sales Tax Rates in Microsoft Excel

472

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.

Lab 9-1T Example of Visual Distributions of Sales Tax Rates in Tableau Desktop

Lab 9-1 Create a Histogram and a Filled Map to Assess


the Distribution of State Sales Tax Rates
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 9-1 [Your name] [Your email address].docx.

Microsoft | Excel

1. Open a new workbook in Excel and connect to your data:


a. From the Data ribbon, click Get Data > From File > From Workbook.
b. Navigate to your Lab 9-1 State Sales Tax Rates.xlsx file > Table 1 and click Load.
c. Create two visualizations to assess sales tax rate distribution across the
United States:
1. A histogram: Insert tab on the ribbon > Recommend Charts > All
Charts > Histogram, click OK.
a. Adjust the bin size by selecting Design tab on the ribbon >Add

473

ISTUDY
Chart Element > Axes > More Axis Options > Adjust Bin width to
.00791, and click Enter on your keyboard.
2. A filled map: Click out of the histogram and into the data so your
active cell is in the table, then click Insert tab on the ribbon >
Recommended Charts > Filled Map, click OK.
a. If prompted to send data to Bing, click I accept.
3. Rearrange your visualizations so you can see both.
4. Rename the sheet Lab 9-1 Output.
2. Take a screenshot that includes both visualizations (label it 9-1 MA).
3. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-1 State Sales Tax Visualization.xlsx.

Tableau | Desktop

1. Create a new workbook in Tableau Desktop and load your data:


a. Click Connect > To a File > Microsoft Excel.
b. Navigate to your Lab 9-1 State Sales Tax Rates.xlsx file and click Open.
2. Create a histogram to show the distribution of state sales tax rates:
a. Go to Sheet1 and rename the sheet Histogram.
b. Rows : Taxrate
c. Change the chart type to Histogram in the Show Me tab.
3. Create a filled map:
a. Create a new Sheet and name it Filled Map.
b. Drag State to Detail in the Marks shelf.
c. Drag Taxrate to Color in the Marks shelf.
4. Create a new dashboard and arrange the Histogram and Filled Map so you
can easily see both.
5. Take a screenshot (label it 9-1 TA).
6. When you are finished answering the lab questions, you may close Tableau.
Save your file as Lab 9-1 State Sales Tax Visualization.twb.

Lab 9-1 Objective Questions (LO 9-1, 9-3)


OQ1. Is the distribution skewed positive or negative (remember that skew indicates
where the outliers are, not the majority of the data)?
OQ2. Looking at the histogram, how many items are in the bin third from the right
(the bin with the most observations)?
OQ3. How many states do not have a sales tax?

Lab 9-1 Analysis Questions (LO 9-3)


AQ1. Considering the Filled Map, which geographic trends do you notice about sales
tax rates, if any?
AQ2. Which states with 0 percent sales tax are next to states with really high sales tax rates?
474

ISTUDY
AQ3. Considering the histogram, are you surprised by its shape where there are
observations on both extremes, but none in the middle? Why would some states
have zero sales tax, but most states have higher sales taxes?

Lab 9-1 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 9-2  omprehensive Case: Calculate Estimated State


C
Sales Tax Owed—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab, you will calculate the estimated state sales tax owed for each
state in which Dillard’s had brick-and-mortar stores in 2015. You will combine a table contain-
ing 2015 state sales tax rates with Dillard’s sales transactions data. If you completed Lab 9-1,
you can begin this lab from the same Lab 9-1 Excel or Tableau workbook that you saved in
Lab 9-1.
Data: Dillard’s Sales Data and Lab 9-2 State Sales Tax Rates.zip - 8KB Zip / 11KB Excel -
Dillard’s sales data are available only on the University of Arkansas Remote Desktop (wal-
tonlab.uark.edu). See your instructor for login credentials.

Lab 9-2 Example Output


By the end of this lab, you will create similar visualizations to what you see below. While
we blurred the pertinent values in these screenshots, your work should look similar to this:

Microsoft | Excel

Microsoft Excel

LAB 9-2M Example of a PivotTable Indicating Estimated Sales Tax Owed for
Dillard’s in 2015 in Microsoft Excel

475

ISTUDY
Tableau | Desktop

Tableau Software, Inc. All rights reserved.

LAB 9-2T Example of a Chart Indicating Estimated Sales Tax Owed for Dillard’s in 2015
in Tableau Desktop

Lab 9-2 Calculate Estimated State Sales Tax Owed in


2015 by Dillard’s
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 9-2 [Your name] [Your email address].docx.

Microsoft | Excel

1. Open Excel:
a. If you are continuing this lab from the work you completed in Lab
9-1, open your completed Lab 9-1 workbook (Lab 9-1 State Sales Tax
Visualization.xlsx).
b. If you do not have the completed Lab 9-1 workbook, open a new work-
book in Excel and connect to your data:
1. From the Data ribbon, click Get Data > From File > From Workbook.
2. Navigate to your Lab 9-2 State Sales Tax Rates.xlsx file > Table 1
and click Load.

476

ISTUDY
2. Import Dillard’s sales data:
a. From the Data tab in the ribbon, click Get Data > From Database > From
SQL Server Database.
1. Server: essql1.walton.uark.edu
2. Database: WCOB_Dillards
3. Expand Advanced Options and input the following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE YEAR(Tran_Date) = 2015 AND State <> ’U’ AND
TRANSACT.STORE <> 698
GROUP BY STATE, ZIP_CODE
b. Click Edit or Transform Data.
c. In the Power Query Editor, merge the Dillard’s transaction data with the
State Sales Tax table:
1. From the Home tab in the ribbon, click Merge Queries:
a. Select the STATE column in Query 1.
b. Select Table1 in the drop-down and select the state column.
c. Place a check mark next to Ignore Privacy Levels. . . in the pop-up
box that appears and click Save.
d. Keep the Join Kind as Left Outer and click OK.
d. Still in the Power Query Editor, create a new column to calculate the
amount of Sales Tax Owed for each state and zip code in 2015:
1. Click the arrows on the Table 1 column to expand the table.
a. Select Expand, unselect state, and click OK.
2. From the Add Column tab in the ribbon, select Custom Column:
a. New column name: Estimated State Sales Tax Owed
b. Custom column formula:= [amount]*[Table1.taxrate]
c. Click OK.
3. From the Home tab in the ribbon, click Close & Load.
4. Rename the sheet Lab 9-2 Output.
e. Insert a PivotTable (Insert tab > PivotTable):
1. Rows: State
2. Values: Estimated Sales Tax Owed (ensure that this appropriately
defaults to SUM)
3. Take a screenshot (label it 9-2MA).
4. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-2 Dillard’s Estimated State Sales Tax Owed.xlsx.

477

ISTUDY
Tableau I Prep

1. Open Tableau:
a. If you are continuing this lab from the work you completed in Lab 9-1,
open your completed Lab 9-1 workbook (Lab 9-1 State Sales Tax Visual-
ization.twb).
b. If you do not have the completed Lab 9-1 workbook, open a new Tableau
workbook and connect to the Lab 9-2 dataset:
1. Click Connect > To a File > Microsoft Excel.
2. Navigate to your Lab 9-2 State Sales Tax Rates.xlsx file and click Open.
2. Import Dillard’s sales data:
a. From the Data Source tab (You will already be in the Data Source tab
if you created a new workbook. If you opened your workbook from Lab
9-1, just navigate back to the Data Source tab in the bottom left corner.),
click Add (next to Connections) > To a Server > Microsoft SQL Server.
1. Server: essql1.walton.uark.edu
2. Database: WCOB_Dillards
3. Click Sign In.
4. Double-click New Custom SQL from the Table section and input the
following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE
WHERE YEAR(Tran_Date) = 2015 AND State <> ’U’ AND
TRANSACT.STORE <> 698
GROUP BY STATE, ZIP_CODE
b. Click OK.
c. The data from Sheet 1 and the Custom SQL Query should automatically
join based on the matching fields of State. Close the Edit Relationship
window.
d. Navigate to a new sheet.
e. Create a new calculated field to calculate the sales tax owed:
1. Click Analysis > Create Calculated Field. . .
a. Name: Estimated State Sales Tax Owed
b. Formula: [AMOUNT]*[Taxrate]
c. Click OK.
2. Create a text table to show the sales tax Dillard’s owes for each state in
2015:
a. Rows : STATE (Custom SQL Query)
b. Text (in the Marks shelf): Estimated State Sales Tax Owed (ensure
that this appropriately defaults to SUM)

478

ISTUDY
3. Take a screenshot (label it 9-2TA).
4. Rename the sheet Lab 9-2 Output.
5. When you are finished answering the lab questions, you may close Tableau.
Save your file as Lab 9-2 Dillard’s Estimated State Sales Tax Owed.twb.

Lab 9-2 Objective Questions (LO 9-3, 9-4)


OQ1. What is the estimated state sales tax owed for Alabama (AL) (round to the
dollar)?
OQ2. Which state that Dillard’s operates in has 0 estimated state sales tax owed?
OQ3. Which state has the highest amount of estimated state sales tax owed?

Lab 9-2 Analysis Questions (LO 9-3, 9-4)


AQ1. What insights can you draw from assessing the variation in sales tax owed
across states that Dillard’s has physical stores in?
AQ2. Do you think it is accurate that Dillard’s doesn’t owe any sales taxes in Montana
at all? Why or why not?
AQ3. In this lab, you combined actual transaction amounts from Dillard’s stores in
2015 with the state sales tax rate in 2015. What additional data would you want
to add to your analysis to make these numbers more meaningful?

Lab 9-2 Submit Your Screenshot Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot lab document to Connect or to the location indicated by your instructor.

Lab 9-3  omprehensive Case: Calculate Total Sales Tax


C
Paid—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if appli-
cable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab, you will calculate the actual amount of sales tax paid by
Dillard’s across each state and zip code in 2015. Unlike Lab 9-2, these data do not need to
be combined with state tax rate data; instead you will create a more complex SQL query to
gather data from the TRANSACT, STORE, SKU, and DEPARTMENT tables.
Part 2 of this lab builds on what you worked on in Lab 9-2. If you did not complete Lab
9-2, a brief description of the work is provided in Part 2, and an additional dataset is avail-
able for you to use.
Data: Dillard’s Sales Data and Lab 9-3 Estimated State Sales Tax Rates.zip - 8KB Zip /
11KB Excel - Dillard’s sales data are available only on the University of Arkansas Remote
Desktop (waltonlab.uark.edu). See your instructor for login credentials.
Data for students who have not completed Lab 9-2 AND for use on the Tableau track:
Lab 9-3 Estimated State Sales Tax Rates.zip;

479

ISTUDY
Lab 9-3 Example Output
By the end of this lab, you will create a table to compare the estimated state sales tax owed
to the actual sales tax owed, along with sparklines to visualize the differences. While we
blurred the pertinent values in these screenshots, your work should look similar to this:

Microsoft | Excel

Microsoft Excel

LAB 9-3M Example of a PivotTable and Sparklines in Microsoft Excel

Tableau | Desktop

Tableau Software, Inc. All rights reserved.

LAB 9-3T Example of a Text Table and Accompanying Line Charts in Tableau Desktop

480

ISTUDY
Lab 9-3 Part 1 Calculate Sales Tax Paid in 2015 by
Dillard’s
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and responses and save it as Lab 9-3 [Your name] [Your email
address].docx.
In this part of the lab, you will calculate the actual sales tax paid in 2015 by Dillard’s in
each different state that it operates physical store locations in.

Microsoft | Excel

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT STATE, ZIP_CODE, DEPT_DESC, SUM(TRAN_AMT) AS
AMOUNT
FROM ((TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE)
INNER JOIN SKU
ON TRANSACT.SKU = SKU.SKU)
INNER JOIN DEPARTMENT
ON SKU.DEPT = DEPARTMENT.DEPT
WHERE DEPT_DESC LIKE ’%TAX%’ AND TRANSACT.STORE <
> 698 AND YEAR(TRAN_DATE) = 2015
GROUP BY STATE, ZIP_CODE, DEPT_DESC
d. Click OK, then Edit or Transform. In the Power Query Editor, you will
make a variety of changes to the data to filter the DEPT_DESC attribute
to include only Sales Tax and Sales Tax Adjustments, to pivot the DEPT_
DESC column on the amount paid (AMOUNT attribute), to replace
null values in the SALES TAX ADJUSTMENT field with 0s, and to create
a new field to calculate the total sales tax Dillard’s paid in each zip code.
1. Filter the DEPT_DESC attribute: Click the dropdown arrow next to
DEPT_DESC and unselect RESTUARANT TAX so that only SALES
TAX and SALES TAX ADJUSTMENT are selected. Click OK.
2. Pivot the DEPT_DESC attribute on AMOUNT: Select the DEPT_
DESC field and from the Transform tab in the ribbon, select Pivot
Column. Adjust the Values Column to AMOUNT. Click OK.
3. Replace null values in the SALES TAX ADJUSTMENT field: Right-
click the header for SALES TAX ADJUSTMENT and select Replace
Values. . .
a. Value to Find: null
b. Replace With: 0

481

ISTUDY
i. This step is important because in the next step, where you will create a
calculated field, Power Query will not interpret null values as numbers,
so the result of any sum with a null value will also be null. Replacing
null with 0 will result in a proper sum of the tax amount paid.
4. Create a calculated field: From the Add Column tab in the ribbon,
click Custom Column. Input the following and click OK.
a. New Column Name: TAX_PAID
b. Custom Column formula: =[SALES TAX] + [SALES TAX
ADJUSTMENT]
3. To make it easier to navigate to this query as the lab continues, we will rename
it from Query1 to something more meaningful. Expand the Queries menu on
the right and right-click Query1. Select Rename and rename the query 9-3.
4. From the Home tab on the ribbon, click Close & Load to load the trans-
formed data into Excel.
5. Create a PivotTable (Insert tab on the Ribbon > PivotTable and click OK).
a. Rows: STATE
b. Values: TAX_PAID
6. Take a screenshot (label it 9-3MA).
7. Rename the sheet as PivotTable.
8. Save your file as Lab 9-3 Dillard’s Sales Tax Paid.xlsx.
9. Answer the questions and continue to Part 2.

Tableau | Desktop

1. Open Tableau Desktop and click Connect to Data> To a Server > Microsoft
SQL Server.
2. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is, click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
e. SELECT STATE, ZIP_CODE, DEPT_DESC, SUM(TRAN_AMT) AS
AMOUNT
FROM ((TRANSACT
INNER JOIN STORE
ON STORE.STORE = TRANSACT.STORE)
INNER JOIN SKU
ON TRANSACT.SKU = SKU.SKU)
INNER JOIN DEPARTMENT
ON SKU.DEPT = DEPARTMENT.DEPT

482

ISTUDY
WHERE DEPT_DESC LIKE ’%TAX%’ AND TRANSACT.STORE <> 698
GROUP BY STATE, ZIP_CODE, DEPT_DESC
3. Click OK.
4. Navigate to Sheet 1.
a. Drag DEPT_DESC to the Filters shelf and select only SALES TAX and
SALES TAX ADJUSTMENT, then click OK.
b. Add the following elements to the Columns and Row shelves:
i. Columns: DEPT_DESC
ii. Rows: STATE
iii. Text (in the Marks shelf): AMOUNT (ensure that this appropriately
defaults to SUM)
iv. Add in the Grand Total Sales Tax Paid for each state (the sum of
SALES TAX and SALES TAX ADJUSTMENT) by clicking the
Analysis tab > Totals > Show Row Grand Totals.
5. Rename Sheet 1 to Lab 9-3 Part 1.
6. Save your Tableau workbook as Lab 9-3 Dillard’s Sales Tax Paid.twb.
7. Take a screenshot (label it 9-3TA).
8. Answer the questions, then continue to Part 2.

Lab 9-3 Part 1 Objective Questions (LO 9-3, 9-4)


OQ1. How much sales tax did Dillard’s pay in New Mexico (NM)?
OQ2. How much sales tax did Dillard’s pay in South Carolina (SC)?
OQ3. Which state did Dillard’s pay the most sales tax in?

Lab 9-3 Part 1 Analysis Questions (LO 9-3, 9-4)


AQ1. From your results, you can see that Dillard’s paid a non-zero amount in sales tax
in Montana, but if you completed either of the previous labs, you would have
learned that Montana does not have a state sales tax. What sales tax do you
think Dillard’s is paying in Montana?
AQ2. In Part 2 of this lab, you will compare the actual sales tax paid in 2015 with the
estimated sales tax paid that you calculated in Lab 9-2. That estimate came from
applying the state sales tax rate in each state that Dillard’s operates physical
stores to the total transaction amount for each of those states. Do you anticipate
the actual sales tax paid to be more or less than the estimated amount? Why?

Lab 9-3 Part 2 Compare Actual Sales Tax Paid in 2015


with the Estimated Sales Tax Paid
In this part of the lab, you will merge the query from Part 1 with the data from Lab 9-2, and
then compare the estimated amount of taxes paid from Lab 9-2 with the actual amount of
taxes paid that you calculated in Part 1 of this lab.
If you did not complete the Microsoft track of Lab 9-2, a dataset has been provided for
you to use. In Lab 9-2, you used state sales tax rates and calculated the estimated amount
of sales tax owed by Dillard’s in each state where it owned and operated brick-and-mortar
stores in 2015.

483

ISTUDY
Microsoft | Excel

1. From the same Excel workbook you created in Part 1 (Lab 9-3 Dillard’s Sales
Tax Paid.xlsx), click the Data tab on the ribbon.
2. Click Get Data > From File > From Workbook:
a. If you are continuing this lab from the work you completed in Lab 9-2,
open your completed Lab 9-2 workbook (Lab 9-2 Dillard’s Estimated
State Sales Tax Owed.xlsx).
b. If you did not complete Lab 9-2, open Lab 9-3 Estimated State Sales Tax
Rates.xlsx.
3. In the Navigator window, select 9-2, then select Edit or Transform. Within the
Power Query Editor, you will make a variety of changes to the data to adjust
the data type for ZIP_CODE from numerical to text, switch the view to the
original Query (9-3), merge the Lab 9-2 query with Query1, and expand the
newly merged query to show only the necessary fields.
a. Adjust data type for ZIP_CODE: Click the 123 icon next to the ZIP_
CODE header and select text. If prompted to Change Column Type,
select Replace current.
b. Switch the view to Query 9-3: Expand the menu on the right labeled
Queries, and select 9-3.
c. Merge the two Queries: From the Home tab in the ribbon, select Merge
Queries.
1. 9-3: select the ZIP_CODE field.
2. Dropdown: select 9-2, then select the ZIP_CODE field.
3. If you are using the worksheet that you created in Lab 9-2:
a. In the Privacy levels window, select the box next to Ignore Privacy
Levels checks. . . and click Save.
4. Leave the Join Kind as a Left Outer join, and click OK.
d. Expand the newly merged query: Click the Expand button on the new
9-2 column (looks like two arrows), and select only AMOUNT and ESTI-
MATED STATE SALES TAX OWED, and click OK.
4. From the Home tab on the ribbon, click Close & Load.
5. Return to the tab of your workbook that you named PivotTable. Right-click
any of the data in the PivotTable and select Refresh. You should see the new
fields that you added in the Power Query Editor in your PivotTable field list
now.
6. Add 9-2.Estimated State Sales Tax Owed to the Values.
7. To visualize whether the estimated amount was more or less than the actual
amount paid, add in a Sparkline for each state.
a. Place your cursor in the cell to the right of the first row (this is probably
cell D4). From the Insert tab in the ribbon, select Sparklines > Line.
1. Data Range: B4:C4
2. Location Range: $D$4

484

ISTUDY
3. Click OK.
4. Drag the Sparkline all the way down the PivotTable dataset to view
sparklines for each state.
8. Take a screenshot (label it 9-3MB) of the PivotTable and the sparklines.  
9. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-3 Dillard’s Sales Tax Paid.xlsx.

Tableau | Desktop

1. From the same Tableau workbook from Part 1 (Lab 9-3 Dillard’s Sales Tax Paid.twb),
click the Data Source tab > Add (next to Connections) > Microsoft Excel and
browse to Lab 9-3 Estimated State Sales Tax Rates.xlsx, and click OK.
2. Double-click the sheet labeled 9-2 to relate it to the Part 1 data, and edit the
relationship so the related fields are ZIP_CODE.
3. Navigate to the Lab 9-3 Part 1 sheet to allow the data to update, then right-
click the sheet name and select Duplicate.
4. Rename the duplicated sheet Lab 9-3 Part 2 and adjust the data to show the
comparison of the estimated state sales tax owed and the actual sales tax paid:
a. Right-click the column headers for SALES TAX and SALES TAX
ADJUSTMENT and select Hide.
b. Double-click ESTIMATED STATE SALES TAX OWED to add it to the
Measure Values.
c. Drag the Measure Names pill from the Rows shelf to the Columns shelf.
5. Create a visualization of the comparison by state:
a. Duplicate your Lab 9-3 Part 2 sheet and name it Lab 9-3 Part 2 - viz.
b. Drag the Measure Values pill from the Marks shelf to the Rows shelf.
c. Adjust the Marks by clicking the Dropdown (labeled (Automatic)) and
select Line.
d. To make it easier to compare each state’s values, adjust the axis. Right-
click any of the axes in the visualizations and select Edit Axis. . .
e. Change the Range to Independent axis ranges for each row or column and
close the window.
6. Create a dashboard to view the raw numbers and the sparklines on the same
sheet.
7. Take a screenshot (label it 9-3TB) of the All tables sheet.  
8. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 9-3 Dillard’s Sales Tax Paid.twb.

Lab 9-3 Part 2 Objective Questions (LO 9-3, 9-4)


OQ1. Were the majority of the amounts paid to each state more than the estimated
amount or less (keep in mind that the order of the columns are swapped in
Excel and Tableau, so be careful how you interpret the sparklines)?

485

ISTUDY
OQ2. In how many states did Dillard’s owe less in taxes than estimated based on the
state sales tax rate?

Lab 9-3 Part 2 Analysis Questions (LO 9-3, 9-4)


AQ1. Consider Montana (MT). The estimated state sales tax owed was $0.00, but the
actual amount paid was $1,043.55. Why did Dillard’s owe taxes in Montana if
Montana did not have a sales tax in 2015?
AQ2. Why do the numbers of sales tax owed vary from the estimated amount for each
state? Why is it important to take into account more data points than just the
state sales tax rate when calculating the amount of tax owed?

Lab 9-3 Submit Your Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
lab document to Connect or to the location indicated by your instructor.

Lab 9-4 C
 omprehensive Case: Estimate Sales Tax Owed by
Zip Code—Dillard’s and Avalara
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab, you will use sales tax rate data from June 2020 (provided
by Avalara) to estimate the amount of sales tax Dillard’s might have owed in 2015 using
not only the state sales tax rates, but also local tax rates (city, county, and special where it
applies). While the tax rates you will use are from 2021 and not 2015, the estimate should
still be closer to what Dillard’s actually paid than the estimate you made in Lab 9-2 because
your calculations will include sales tax rates beyond just the state rates. These tax rates are
provided for free and updated monthly by Avalara, although the freely provided rates do not
come without risk—there are cities and counties that contain multiple zip codes, and some
zip codes cross multiple city and county lines. In this lab, you will investigate how well you
can rely on freely provided resources for tax rates versus paying for tax rates based on physi-
cal address (a much more granular and precise data point).
Data: Dillard’s Sales Data and Lab 9-4 Avalara Tax Rates.zip - 236KB Zip / 1.84MB
CSV - Dillard’s sales data are available only on the University of Arkansas Remote Desktop
(waltonlab.uark.edu). See your instructor for login credentials.

Lab 9-4 Example Output


By the end of this lab, you will create visualizations similar to what you see below. While
we blurred the pertinent values in these screenshots, your work should look similar to this:

486

ISTUDY
Microsoft | Excel Desktop

Microsoft Excel
LAB 9-4M Example of a PivotTable with Conditional Formatting and Sparklines in
Microsoft Excel

Tableau | Desktop

Tableau Software, Inc. All rights reserved.

LAB 9-4T Example of Tabular Data and Sparklines in Tableau Desktop

Lab 9-4 Part 1 Combine and Explore the Avalara Tax


Rate Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and responses and save it as Lab 9-4 [Your name] [Your email
address].docx.

487

ISTUDY
Microsoft | Excel

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From File > From Folder. Browse to the location you saved
Lab 9-4 Avalara Tax Rates folder.
3. Click OK, then Edit or Transform. In the Power Query Editor, you will make
a variety of changes to the data to prepare and combine your data. First you
will remove the Avalara Disclaimer from the files to combine, then you will
combine the files based on the first state file in the folder.
a. Remove the Avalara Disclaimer: From the Home tab in the ribbon, select
Remove Top Rows.
1. Number of rows: 1 and click OK.
b. Combine the files: From the Home tab in the ribbon, select Combine
Files. Leave all of the defaults and click OK.
4. From the Home tab on the ribbon, click Close & Load to load the transformed
data into Excel.
5. Before we combine these data with the Dillard’s data, we can explore the fields
in this dataset. Take a moment to scroll through the data to familiarize yourself
with the data and the different fields available to work with.
6. Create a PivotTable (Insert tab on the Ribbon > PivotTable and click OK).
a. Rows: State
b. Values: State Rate (change the aggregate measure to AVERAGE)
c. After glancing through the State Tax Rates, add EstimatedCombinedRate
to the Values and adjust the aggregate measure to AVERAGE.
7. Further explore the data by adding conditional formatting and/or sorting to
assess which states have the highest state rates and combined rates, and also
to compare states and combined rates across states.
8. Take a screenshot (label it 9-4MA).
9. Rename the sheet to PivotTable.
10. Save your file as Lab 9-4 Avalara Combined Data.xlsx.
11. Answer the questions and continue to Part 2.

Tableau | Desktop

1. Open Tableau Desktop and click To a File > To a Text file > Browse to the
location you saved Lab 9-4 Avalara Tax Rates folder and connect to the first
file in the folder.
2. Right-click the rectangle in the Connection pane for the TAXRATES_
ZIP5. . . file and select Convert to Union. . .
a. Select Wildcard (automatic) and click OK. This action will combine all
of the files in the Tax Data folder into one.

488

ISTUDY
3. Navigate to Sheet 1.
a. Rows: State
b. Marks (Text): State Rate (change the aggregate measure to AVERAGE)
c. After glancing through the State Tax Rates, add Estimated Combined
Rate to the Values by double-clicking on the field name, then adjust the
aggregate measure to AVERAGE.
4. Adjust the visualization using the Show Me tab or by sorting to assess which
states have the highest state rates and combined rates, and also to compare
state and combined rates across states.
5. Rename Sheet 1 to Lab 9-4 Part 1.
6. Save your Tableau workbook as Lab 9-4 Avalara Combined Data.twb.
7. Take a screenshot (label it 9-4TA).
8. Answer the questions, then continue to Part 2.

Lab 9-4 Part 1 Objective Questions (LO 9-3, 9-4)


OQ1. Which state that Dillard’s has stores in has the highest estimated combined tax rate?
OQ2. What is the average state tax rate for Alabama (AL)?
OQ3. What is the combined tax rate for Alabama (AL)?

Lab 9-4 Part 1 Analysis Questions (LO 9-3, 9-4)


AQ1. Using either the raw data in the table (Sheet2) or drilling down in the PivotTable,
determine which counties and/or cities most contribute to the disparity between
Alabama’s state sales tax rate and the average combined tax rate.

Lab 9-4 Part 2 Merge the Avalara Tax Data with the
Dillard’s Data
In this part of the lab, you will merge the query from Part 1 with the data from Lab 9-3, and
then compare the estimated amount of taxes paid from Part 1 of this lab with the actual
amount of taxes paid that you calculated in Lab 9-3.
If you did not complete the Microsoft track of Lab 9-3, a dataset has been provided for
you to use. In Lab 9-3, you used Dillard’s transaction data, in particular the sum of SALES
TAX and SALES TAX ADJUSTMENT amounts accounted for in the database for each
state in which Dillard’s operated stores in 2015.

Microsoft | Excel

1. From the same Excel workbook you created in Part 1 (Lab 9-4 Avalara Com-
bined Data.xlsx), click the Data tab on the ribbon.
2. Click Get Data > From File > From Workbook:
a. If you are continuing this lab from the work you completed in Lab 9-3, open
your completed Lab 9-3 workbook (Lab 9-3 Dillard’s Sales Tax Paid.xlsx).

489

ISTUDY
b. If you did not complete Lab 9-3, open Lab 9-4 Dillard’s Sales Tax Paid
Abbreviated.xlsx.
3. In the Navigator window, select 9-3, then select Edit or Transform. Within
the Power Query Editor, you will make a variety of changes to the data
to switch the view to the original Query (Lab 9-4 Avalara ZipCode Tax
Data), merge the Lab 9-3 query with the current query (using a RIGHT
join instead of a LEFT join), expand the newly merged query to show
only the necessary fields, and create a calculated field to estimate the tax
owed based on the Avalara Estimated Combined Tax Rate and the actual
amount of sales Dillard’s had in 2015 in each zip code that it has a physical
location.
a. Switch the view to the query “Lab 9-4 Avalara ZipCode Tax Data:” Expand
the menu on the right labeled Queries, and select Lab 9-4 Avalara Tax Rates.
b. Merge the two Queries: From the Home tab in the ribbon, select Merge
Queries.
1. Lab 9-4 Avalara ZipCode: select the ZipCode field.
2. Dropdown: select 9-3, then select the ZIP_CODE field.
3. If you are using the worksheet that you created in Lab 9-3:
a. In the Privacy levels window, select the box next to Ignore Privacy
Levels checks. . . and click Save.
4. Adjust the Join Kind to: Right Outer (all from second, matching from
first), and click OK.
c. Expand the newly merged query: Click the Expand button on the
new 9-3 column (looks like two arrows), and select only TAX_PAID,
AMOUNT, and STATE SALES TAX OWED.
d. Create a calculated field: From the Add Column tab in the ribbon, click
Custom Column. Input the following and click OK.
1. New Column Name: Estimated_Tax_Owed_by_Zip
2. Custom Column formula: =[EstimatedCombinedRate] * [Amount]
3. Click OK.
4. From the Home tab on the ribbon, click Close & Load.
5. Return to the tab of your workbook that you named PivotTable. Right-click any
of the data in the PivotTable and select Refresh. You should see the new fields
that you added in the Power Query Editor in your PivotTable field list now.
6. Add the following three fields to the values (and ensure that the default aggre-
gate is SUM):
• Estimated State Sales Tax Owed
• Tax Paid
• Estimated Tax Owed by Zip
7. Create a Sparkline to view how the estimates differ from the estimate made
from the Lab 9-2 data (an estimate of only state sales tax owed), 9-3 data (the
amount Dillard’s actually paid), and the new estimate (an estimate of sales
tax owed taking into account city, county, and state tax rates).
a. Place your cursor in the cell to the right of the first row (this is probably
cell G4). From the Insert tab in the ribbon, select Sparklines > Line.

490

ISTUDY
1. Data Range: D4:F4
2. Location Range: $G$4
3. Click OK.
4. Drag the Sparkline all the way down the PivotTable dataset to view
sparklines for each state.
8. Take a screenshot (label it 9-4MB) of the PivotTable and the sparklines.  
9. When you are finished answering the lab questions, you may close Excel.
Save your file as Lab 9-4 Avalara Dillards Combined.xlsx.

Tableau | Desktop

1. From the same Tableau workbook from Part 1 (Lab 9-4 Avalara Combined
Data.twb), click the Data Source tab > Add (next to Connections) > Microsoft
Excel and browse to Lab 9-4 Dillard’s Sales Tax Paid Abbreviated.xlsx, and
click OK.
2. Double-click the sheet labeled 9-3 to relate it to the Part 1 data, and edit the
relationship so the related fields are ZIP_CODE.
a. If the relationship does not form because of a Type Mismatch, this
means that the data type for Zip Code is string (text) in one of the data-
sets and number in another. Simply select one of the Zip Code attributes
and adjust its data type so it matches the other Zip Code data type.
3. Navigate to a new sheet and rename it Lab 9-4 Part 2.
a. Rows: State (either State field will work)
b. Marks (Text): State Sales Tax Owed
c. Double-click TAX_PAID and Estimated Tax Owed by Zip on the field
name, to add them to the Measure Values.
4. Create a visualization of the comparison by state:
a. Duplicate Lab 9-4 Part 2 and name it Lab 9-4 Part 2 - viz.
b. Drag the Measure Values pill from the Marks shelf to the Rows shelf.
c. Adjust the Marks by clicking the Dropdown (labeled (Automatic)) and
select Line.
d. To make it easier to compare each state’s values, adjust the axis. Right-
click any of the axes in the visualizations and select Edit Axis. . .
e. Change the Range to Independent axis ranges for each row or column and
close the window.
5. Create a dashboard to view the raw numbers and the sparklines on the same
sheet.
6. Take a screenshot (label it 9-4TB) of the All tables sheet.  
7. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 9-4 Avalara Dillards Combined.twb.

491

ISTUDY
Lab 9-4 Part 2 Objective Questions (LO 9-3, 9-4)
OQ1. Did Dillard’s pay more or less than the estimated combined amount in Missis-
sippi (MS)?
OQ2. Did Dillard’s pay more than the estimated combined amount in any states?

Lab 9-4 Part 2 Analysis Questions (LO 9-3, 9-4)


AQ1. Based on what your analysis of estimating what Dillard’s should pay based on
the estimated combined rate versus the actual amount Dillard’s paid, draw two
different insights regarding how Dillard’s should calculate the amount of sales
tax it owes the city, county, and states that it operates in.
AQ2. Recognizing that tax software that provides tax rates based on precise addresses
is not inexpensive, and that there is a cost benefit for some companies to rely on
these free rates updated monthly by Avalara, what factors would you take under
consideration for a company making a decision right now on whether to invest
in paid tax software or to use the freely provided rate tables?

Lab 9-4 Submit Your Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
lab document to Connect or to the location indicated by your instructor.

Lab 9-5 C
 omprehensive Case: Online Sales Taxes
Analysis—Dillard’s and Avalara
Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable,
can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: In this lab, you will gather transaction data for only the online sales that
Dillard’s processed in 2015, as well as the location of the customer to determine what the
estimated sales tax charges would be. In this lab, you will use the Risk Level attribute from
the Avalara dataset to help determine whether you would recommend that Dillard’s would
be safe to use the free resources, or if it should pay for software that would determine sales
tax owed based on more precise measures.
Avalara ZipCode Tax Data: For Lab 9-5, the files for every state, D.C., and Puerto Rico
have been included because of the large extent of localities that Dillard’s ships online
sales to.
Data: Dillard’s Sales Data and Lab 9-5 Avalara Tax Rates.zip - 289KB Zip / 2.33MB
CSV - Dillard’s sales data are available only on the University of Arkansas Remote Desktop
(waltonlab.uark.edu). See your instructor for login credentials.

Lab 9-5 Example Output


By the end of this lab, you will create visualizations similar to what you see below. While
we blurred the pertinent values in these screenshots, your work should look similar to this:

492

ISTUDY
Microsoft | Excel

Microsoft Excel

LAB 9-5M Example of a PivotTable with Conditional Formatting in Microsoft Excel

Tableau | Desktop

Tableau Software, Inc. All rights reserved.

LAB 9-5T Example of a Text Table with Color Formatting in Tableau Desktop

Lab 9-5 Combine the Avalara Tax Rate Data and Merge
with Dillard’s Data
As was mentioned in the chapter, the South Dakota v. Wayfair Supreme Court decision in
2018 resulted in new complexities for online sales by allowing states to require businesses
who participate in online sales in their state to collect and remit sales tax. The Dillard’s data

493

ISTUDY
that we have in this database only goes through 2016, which only includes transactions that
occurred prior to the Wayfair decision, but we can use these past transactions to simulate
the decisions Dillard’s (and other companies) needed to make in a post-Wayfair world. If
you completed Lab 9-4, you learned a bit about the decision a company might make regard-
ing using free resources for determining sales tax rates versus more precise paid resources.
A few notes on Risk Level:
• Avalara created the Risk Level attribute to help companies understand how much risk
they are taking on by using zip codes to calculate rate instead of specific addresses
(remember that some zip codes cross city, county, and even state lines!).
• When a tax rate has a risk value of 1, this means the entire zip code is associated with a
single city and county. This indicates a high likelihood that the single combined tax rate
applies across the whole area. Any zip codes with a higher risk value, however, are asso-
ciated with multiple counties or cities—and the higher the number, the more boundary
lines the zip code crosses. The purpose is to gauge how much risk a company is taking
on by either exposing its business to significant risk at audit time or aggravating custom-
ers by overcharging them on items by as much as 5 or 6 percent.
• The zip codes AA, AE, and AP are not associated with a particular state; rather they
describe different military abbreviations, so they are either international or cross state
lines (in particular, AA crosses state lines because it covers military addresses in the
Americas excluding Canada).
Before you begin the lab, you should create a new blank Word document where you
will record your screenshots and responses and save it as Lab 9-5 [Your name] [Your email
address].docx.

Microsoft | Excel

1. From Microsoft Excel, click the Data tab on the ribbon.


2. Click Get Data > From File > From Folder. Browse to the location you saved
Lab 9-5 Avalara ZipCode Tax Data folder.
3. Click OK, then Edit or Transform. In the Power Query Editor, you will make
a variety of changes to the data to prepare and combine your data. First
you will remove the Avalara Disclaimer from the files to combine, then you
will combine the files based on the first state file in the folder and adjust
the data type for ZipCode (this will make it easier to merge these files with
the Dillard’s data). Finally, you will add the Dillard’s data to the Power
Query model, merge the queries, and create a calculated field to estimate the
amount of tax owed to each of the customers’ zip codes who participated in
on online sales.
a. Remove the Avalara Disclaimer: From the Home tab in the ribbon,
select Remove Top Rows.
1. Number of rows: 1 and click OK.
b. Combine the files: From the Home tab in the ribbon, select Combine
Files. Leave all of the defaults and click OK.
c. Adjust the ZipCode data type: Click the 123 icon next to the ZipCode
header and select text. If prompted to Change Column Type, select
Replace current.

494

ISTUDY
d. Add the Dillard’s data to the Power Query model: From the Home tab in
the Power Query ribbon, select New Source > Database > SQL Server.
1. Server: essql1.walton.uark.edu
2. Database: WCOB_Dillards
3. Expand Advanced Options and input the following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS AMOUNT
FROM TRANSACT
INNER JOIN CUSTOMER
ON CUSTOMER.CUST_ID = TRANSACT.CUST_ID
WHERE TRANSACT.STORE = 698 AND YEAR (TRAN_DATE)
= 2015 GROUP BY STATE, ZIP_CODE
Unlike in previous Chapter 9 comprehensive labs where you cre-
ated a query to filter out the online sales, this query focuses on only
online sales and the customer’s location instead of the physical
store’s location.
e. 1. Merge the two Queries: From the Home tab in the ribbon, select
Merge Queries.
a. Query1: select the ZIP_CODE field.
b. Dropdown: select Lab 9-5 Avalara Tax Rates, then select the
ZipCode field.
i. In the Privacy levels window, select the box next to Ignore
Privacy Levels checks. . . and click Save.
c. Leave the join kind as Left Outer (the default) and click OK.
2. Expand the newly merged query: Click the Expand button on the
new Lab 9-5 Avalara Tax Rates column (looks like two arrows), and
select only EstimatedCombinedRate and RiskLevel.
3. Create a calculated field: From the Add Column tab in the ribbon,
click Custom Column. Input the following and click OK.
a. New Column Name: Estimated_Online_Tax_Owed_by_Zip
b. Custom Column formula: =[EstimatedCombinedRate] * [Amount]
c. Click OK.
f. From the Home tab on the ribbon, click Close & Load.
g. Create a PivotTable (Insert tab on the Ribbon > PivotTable and click OK).
1. Rows: State
2. Values: Estimated_Online_Tax_Owed_By_Zip (SUM) and RiskLevel
(AVERAGE)
3. Add Conditional Formatting and/or sort your PivotTable based
on Risk Level to assess which states pose a higher level of risk for
Dillard’s if it calculates only online sales tax based on zip code (and
not specific shipping address).
4. Take a screenshot of your PivotTable with sorting and/or Conditional
Formatting (label it 9-5MA).
5. Save your file as Lab 9-5 Avalara and Dillard’s Online Data.xlsx.

495

ISTUDY
Tableau | Desktop

1. Open Tableau Desktop. From Tableau’s Data Source page, you will take
a few steps to import and prepare your data: importing the Avalara data,
changing the data type for the Zip Code (this will help relate the Avalara
data to the Dillard’s data in the next step), and importing the Dillard’s data
and relating the two datasets.
a. Import Avalara Data: Click To a File > To a Text file > Browse to the
location you saved Lab 9-5 Avalara Tax Rates folder and connect to the
first file in the folder.
1. Right-click the rectangle in the Connection pane for the Lab 9-5
Avalara ZipCode Tax Data. . . file and select Convert to Union. . .
2. Select Wildcard (automatic) and click OK. This action will combine
all of the files in the Tax Data folder into one.
b. Adjust the Zip Code data type: Click the Globe icon next to the Zip
Code header and select string.
c. Import Dillard’s data: Click Add (next to Connections) > To a Server >
Microsoft SQL Server.
1. Enter the following:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. All other fields can be left as is; click Sign In.
d. Instead of connecting to a table, you will create a New Custom SQL
query. Double-click New Custom SQL and input the following query:
SELECT STATE, ZIP_CODE, SUM(TRAN_AMT) AS
AMOUNT
FROM TRANSACT
INNER JOIN CUSTOMER
ON CUSTOMER.CUST_ID = TRANSACT.CUST_ID
WHERE TRANSACT.STORE = 698 AND YEAR ( TRAN_
DATE ) = 2015
GROUP BY STATE, ZIP_CODE
i. Unlike in previous Chapter 9 comprehensive labs where you
created a query to filter out the online sales, this query focuses
on only online sales and the customer’s location instead of the
physical store’s location.
e. In the Edit Relationship window, adjust the related fields to Zip
Code in each query, then close the window.
2. Navigate to Sheet 1:
a. Create a Calculated Field: From the Analysis tab, select Create
Calculated Field. . .
1. Name: Estimated Online Tax Owed by Zip
2. Calculation: [Estimated Combined Rate] * [Amount]
3. Click OK.

496

ISTUDY
b. Rows: State
c. Measure Values (double-click these to ensure they appear in Measure
Values and not elsewhere): Estimated Online Tax Owed by Zip (SUM)
and Risk Level (AVERAGE).
d. Add either Color Marks formatting or sort your table based on Risk Level to
assess which states pose a higher level of risk for Dillard’s if it calculates only
online sales tax based on zip code (and not specific shipping address).
3. Take a screenshot (label it 9-5TA).
4. Save your Tableau workbook as Lab 9-5 Avalara and Dillard’s Online Data.twb.

Lab 9-5 Objective Questions (LO 9-3, 9-4)


OQ1. Select the states that have an average risk level for Dillard’s above 2.
OQ2. Which “state” (just use the abbreviation) has the highest average risk level?
OQ3. What is the average risk level for Arkansas (AR)?

Lab 9-5 Analysis Questions (LO 9-3, 9-4)


AQ1. What should companies consider regarding sales tax for online transactions ver-
sus in-person transactions?
AQ2. Based on the risk levels you calculated for the zip codes that Dillard’s sells to,
what would your recommendation be to Dillard’s regarding purchasing tax soft-
ware versus using freely available rate tables?
AQ3. Recognizing that the Wayfair decision continues to impact state tax laws regard-
ing online sales, what factors would you take into account for a company that is
beginning to embrace online sales and is deciding whether to invest in tax soft-
ware versus using freely available rate tables?

Lab 9-5 Submit Your Lab Document


Verify that you have answered any questions your instructor has assigned, then upload your
lab document to Connect or to the location indicated by your instructor.

497

ISTUDY
Chapter 10
Project Chapter (Basic)

A Look at This Chapter


This chapter will take you through a series of problems to help you analyze and communicate answers to accounting
questions that are asked every day. This will provide a review of the Data Analytics concepts we’ve discussed in the
previous chapters and put them into perspective. For each analysis, we will look at the data from a managerial, audit-
ing, and financial accounting perspective.

A Look Back
Chapter 9 discussed the application of Data Analytics to tax questions and looked at tax data sources and how they
may differ depending on the tax user (a tax department, an accounting firm, or a regulatory body) and tax needs. We
also investigated how visualizations are useful components of tax analytics. Finally, we considered how data analysis
might be used to assist in tax planning.

A Look Forward
Chapter 11 will revisit the Dillard’s sales and returns data to provide an advanced overview of different analytical
tools and techniques to provide additional understanding of the data.

498

ISTUDY
Tools like Tableau and Power BI are
popular because they enable quick
analysis of simple descriptive and
diagnostic analytics. By creating visual
answers to data problems, accoun-
tants can tell stories that help inform
management decisions, aid auditors,
and provide insight into financial data.
Both Tableau and Power BI enable
more simplified analysis by incor-
porating natural language process-
ing into their cloud-based offerings.
Instead of dragging dimensions and
measures to build the analyses, you
can simply ask a question in a natural
sentence, and the tool will map your
question to your existing data model.

Microsoft PowerBI

OBJECTIVES
After reading this chapter, you should be able to:

LO 10-1 Analyze data in the order-to-cash process.


LO 10-2 Analyze data in the procure-to-pay process.

499

ISTUDY
500 Chapter 10 Project Chapter (Basic)

EVALUATING BUSINESS PROCESSES


As a manager, auditor, or financial accountant, your role is to understand how different
business processes operate and ensure controls exist over those processes. Companies use
Data Analytics to summarize data for reports, evaluate performance, and identify risk in
these cycles.
In this basic project chapter, we will work through a series of questions that help you
understand how data from different aspects of each business process can answer a variety of
questions depending on the user’s perspective.
There are two main question sets that explore the following:
1. Question Set 1 looks at the order-to-cash process, or sales/revenue cycle, within a com-
pany; you will summarize flows of sales order transactions, accounts receivable, and
customer activity.
2. Question Set 2 moves into the procure-to-pay process, or purchasing cycle, to evaluate
purchasing activity and potential savings when paying or interacting with vendors.
Important note: Each set walks you through the analysis you are supposed to create with
more generic instructions than you have seen in previous chapter labs. The goal is to evalu-
ate your understanding of the tools and your ability to create final reports. The data have
already been cleaned and formatted. As you complete the steps, you will encounter several
questions that ask you about your approach to the analysis, how to interpret the results, and
how you would expand the analysis using a comprehensive dataset.

QUESTION SET 1: ORDER-TO-CASH


LO 10-1 The order-to-cash (O2C) process, or sales cycle, involves three main processes:
Analyze data in 1. Sales order processing
the order-to-cash 2. Order fulfillment and shipping
process. 3. Billing and cash collections
Financial accountants are interested in determining the amount of sales revenue on the
income statement and accounts receivable balance on the balance sheet as well as the cal-
culation of bad debts expense.
Managers are concerned with making the process as efficient as possible to ensure
increased sales volume, sufficient profitability, and fast cash collection.
Auditors should test sales transaction and master data to ensure that only authorized
users are processing orders, that sales prices match master data and aren’t altered, and that
customers aren’t exceeding approval limits. In addition, auditors use accounts receivable
aging schedules to evaluate outstanding balances.
The O2C tables follow the UML diagram shown in Exhibit 10-1.

QS1 Part 1 Financial: What Is the Total Revenue and


Balance in Accounts Receivable?
Begin by calculating account balances used to prepare financial statements. The order-to-cash
process provides data to evaluate the revenue, which the company bases on the invoice amount,
and accounts receivable balances, which the company calculates based on the difference
between the invoice and cash collections and write-offs. The revenue and accounts receivable
balances are essential for inclusion in the income statement and balance sheet, respectively.

ISTUDY
Rev. Confirming Pages

Chapter 10 Project Chapter (Basic) 501

EXHIBIT 10-1 Order-to-Cash Data


Data: 10-1 O2C Data.zip - 1.9MB Zip / 2.0MB Excel

Microsoft or Tableau

Using the skills you have gained throughout this text, use Microsoft Power BI or
Tableau Desktop to complete the generic tasks presented below:
Build a new dashboard (Tableau) or page (Power BI) called Financial that includes
the following:
1. Create a new workbook, connect to 10-1 O2C Data.xlsx, and import all
seven tables. Double-check the data model to ensure relationships are cor-
rectly defined as shown in Exhibit 10-1.
2. Add a table to your worksheet or page called Sales and Receivables that
shows the invoice month in each row and the invoice amount, receipt
amount, adjustment amount, AR balance, and write-off percentage in the
columns. Tableau Hint: Use Measure Names in the columns and Measure
Values in the marks to create your table. Then once your table is complete,
use Analytics > Summarize > Totals to calculate column totals.

ISTUDY ric44907_ch10_498-511.indd 501 06/22/22 12:06 PM


502 Chapter 10 Project Chapter (Basic)

a. You will need to create a new measure or calculated field showing the
account AR Balance, or the total invoice amount minus the total receipt
amount minus the total adjustment amount. Tableau Hint: To minimize
erroneous values from appearing in Tableau due to blank or missing
values, use the IFNULL() function to replace blank values with 0, for
example, IFNULL([Receipt Amount],0).
b. You will need to calculate the write-off percentage as the total AR adjustment
divided by total invoice amount. Hint: Format the write-off percentage as
a percent or to four decimals.
c. Filter this visual to show only values from January 2020 to December 2020.
3. Add a new bar chart called Bad Debts that shows the invoice amount and
adjustment amount along with a tooltip for write-off percentage. Tableau
Hint: Choose Dual Axis and Synchronize Axis to combine the two values.
a. Filter this visual to show only values from January 2020 to December 2020.
4. Clean up the formatting and titles of your visuals and combine them into a
single dashboard or page labeled Financial.
5. Take a screenshot of your dashboard showing the account balances (label it 10-1A).
6. Save your workbook as 10-1 O2C Analysis, answer the lab questions, then
continue to Part 2.

QS1 Part 1 Objective Questions


OQ1. What value would be recorded for total sales revenue on the income statement
for 2020?
OQ2. Which month recorded the highest sales revenue?
OQ3. What value would be recorded for total accounts receivable on the balance
sheet for 2020, excluding the allowance for doubtful accounts?
OQ4. If management estimates about 5 percent of sales to be written off each month,
which month(s) exceeded that estimate?
OQ5. If management estimates about 3 percent of the accounts receivable balance to
be written off, what would be the adjustment amount for bad debts expense at
the end of the year?
OQ6. Which month’s accounts receivable balance is most likely to be written off?

QS1 Part 1 Analysis Questions


AQ1. Why do you think there are no write-offs in November and December?
AQ2. How could management estimate the write-offs for November and December?
AQ3. How could analytics provide additional insight into financial information
beyond calculating balances?

ISTUDY
Chapter 10 Project Chapter (Basic) 503

QS1 Part 2 Managerial: How Efficiently Is the Company


Collecting Cash?
For this part, put yourself in a manager’s shoes and create an analysis that will help answer
questions about the order-to-cash process. In this case, management focuses on sales orders,
rather than invoices as in Part 1, to determine the contract amount and evaluates the age of
an open account based on the invoice due date.

Microsoft or Tableau

Using the skills you have gained throughout this text, use Microsoft Power BI or
Tableau Desktop to complete the generic tasks presented below:
Build a dashboard (Tableau) or page (Power BI) called Management with the
following:
1. Add a filter to show only sales orders from November.
2. Add a table to your page called Total Sales by Day that shows the total sales
order amount by sales order date. Power BI Hint: Use the date hierarchy to
drill down to specific days of the month. Tableau Hint: Set the sales order
date to DAY() and place the total sales order amount as a text mark.
3. Add a bar chart to your page called Sales by Customer that shows the total
sales order amount by customer account name in descending order.
4. Add a new matrix table to your page called AR by Customer that shows the
customer and invoices in rows, and earliest invoice due date, age, and balance
as values.
a. Create a parameter showing the Report Date as 12/31/2020. Power
BI Hint: Create a new column and use the DATE() function.
b. Create a new measure showing the Age as the difference between the
Invoice Due Date and the Report Date. Power BI Hint: Use the
DATEDIFF() function to calculate the age and the MIN() function on
the date fields to load specific dates.
c. Use the AR Balance you created in Part 1.
d. Filter the table to show only outstanding balances that are greater than 0.
5. Add a new card to your page called Days Sales Outstanding to show the cur-
rent KPI value. Hint: Create a new measure showing the DSO as the accounts
receivable balance divided by the total sales amount multiplied by 30 days.
6. In Tableau, combine all of these visuals into one dashboard.
7. Take a screenshot of your dashboard (label it 10-1B).
8. Save your workbook, answer the lab questions, then continue to Part 3.

QS1 Part 2 Objective Questions


OQ1. What are the total sales orders for November?
OQ2. Which day had the highest sales order amount in November?

ISTUDY
504 Chapter 10 Project Chapter (Basic)

OQ3. Which customer did the company sell the most to in November?
OQ4. How much did the company sell to that customer in November?
OQ5. What is the value of the oldest outstanding invoice in November?
OQ6. What is the age of the oldest outstanding invoice?
OQ7. What is the current days sales outstanding KPI value for 2020?

QS1 Part 2 Analysis Questions


AQ1. Why are some dates missing in the sales by date visualization?
AQ2. Some of the accounts may have a negative age. Is this an error? What might
explain this?
AQ3. What does the days sales outstanding KPI tell managers in general terms?
AQ4. What risks are present if you take too long to collect accounts receivable?
AQ5. What are some analyses you could perform that would provide insight into
how efficiently your company is collecting cash from customers? Are there any
KPIs that would be appropriate here?
AQ6. In your opinion, what would be an appropriate benchmark for the average
number of days sales outstanding (i.e., Accounts receivable/Sales × 365)?
Would management want this number higher or lower?

QS1 Part 3 Audit: Is the Delivery Process Following the


Expected Procedure?
As an auditor, you’re interested in determining whether the delivery process follows the
expected sequence. Specifically, does the delivery follow the sales order, and has each deliv-
ery been matched with an invoice? For this part, put yourself in an auditor’s shoes and cre-
ate an analysis that will help answer questions about the order-to-cash process.

Microsoft or Tableau

Using the skills you have gained throughout this text, use Microsoft Power BI or
Tableau Desktop to complete the generic tasks presented below:
Build a new dashboard (in Tableau) or page (in Power BI) called Audit that
includes the following:
1. Add a table to your page called Exceptions to identify any shipments that
occurred before the order was placed. It should show the Sales Order ID and
the number of days to ship in ascending order.
a. Create a new measure called Order To Ship Days that calculates the
difference between the sales order date and the shipment date. Power BI
Hint: Use the DATEDIFF() function to calculate the difference and the
MIN() function on the date fields to load specific dates.
b. Filter this visual on order to ship days to show only negative values.

ISTUDY
Chapter 10 Project Chapter (Basic) 505

2. Add a new matrix table called Missing Invoice to determine whether any
orders have shipped but have not yet been invoiced. It should list the sales
orders, earliest (minimum) shipment date, minimum shipment ID, and mini-
mum invoice ID.
a. Filter this visual on invoice ID to show only missing (blank) values.
3. You should find at least one exception here. If you don’t see any exceptions,
try selecting different months in the sales order date month filter.
4. Take a screenshot of your dashboard showing exceptions and missing
invoices (label it 10-1C).
5. Save your workbook, answer the lab questions, then continue to Part 4.

QS1 Part 3 Objective Questions


OQ1. Which orders were created after shipment?
OQ2. How many orders have been shipped but not yet invoiced?

QS1 Part 3 Analysis Questions


AQ1. How do you know some orders were created after shipment?
AQ1. Of the shipped orders without an invoice, which one is the most
problematic? Why?
AQ2. Why aren’t the other shipped orders without an invoice suspicious?
AQ3. While you still have your auditor hat on, what are some additional analyses
you could perform to understand whether the process or processes are being
followed or controls are functioning properly?
AQ4. Under what circumstances might a delivery take place before a sales order?
Should this happen?
AQ5. What types of controls would prevent the system from skipping a process
or step?

QS1 Part 4 What Else Can You Determine about the


O2C Process?
We’ve discussed a few different ways to analyze O2C data to understand the processes and
controls. Now it’s your chance to find answers to your own questions.
Identify five questions that you think management, auditors, or financial analysts would
want to know about the O2C process. If you need help, search for some common questions
asked by accountants on the Internet.
Using the data you have already loaded into Power BI or Tableau, generate at least three
visualizations that will help you find the answers to your five questions. Load them into a
report or dashboard, and take a screenshot of your dashboard analyses (label it 10-1D).

ISTUDY
506 Chapter 10 Project Chapter (Basic)

QS1 Part 4 Analysis Questions


AQ1. Write your first question and provide an answer based on your analysis.
AQ2. Write your second question and provide an answer based on your analysis.
AQ3. Write your third question and provide an answer based on your analysis.
AQ4. Write your fourth question and provide an answer based on your analysis.
AQ5. Write your fifth question and provide an answer based on your analysis.

QS1 Submit Your Screenshot Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot document to Connect or to the location indicated by your instructor.

LO 10-2 QUESTION SET 2: PROCURE-TO-PAY


Analyze data in The procure-to-pay process (P2P), or purchasing cycle, for a retailer involves four main
the procure-to-pay processes:
process. 1. Create and submit a purchase order.
2. Receive inventory.
3. Receive an invoice.
4. Pay the invoice.
The procure-to-pay process has some additional challenges in that there are numerous
opportunities to divert company funds. Therefore you should focus on the risk of unauthor-
ized and fictitious payments (e.g., to shell vendors) and ensure that the process is appropri-
ately controlled.
Managers would want to ensure that inventory matches what they (or their company)
ordered, that invoices aren’t paid more than once, and that the payments are sent to
approved parties.
Auditors are interested in testing the internal controls that govern who can create orders,
receive items, and approve payments. In addition to segregation of duties, they may be inter-
ested in matching each of the documents (i.e., purchase order, receiving report, and vendor
invoice) in a three-way match.
To answer the following questions, use the Tableau workbook and Excel data files found
in Connect. The P2P data, shown in Exhibit 10-2, have joined together the following tables
into one file.

QS2 Part 1 Financial: Is the Company Missing Out on


Discounts by Paying Late?
Begin by calculating account balances used to prepare financial statements. The procure-to-
pay process provides data to evaluate the cash paid to suppliers and accounts payable bal-
ances, which the company calculates based on the difference between the invoice received
and cash payments and adjustment. The cash paid and accounts payable balances are essen-
tial for inclusion in the statement of cash flows and balance sheet, respectively.

ISTUDY
Chapter 10 Project Chapter (Basic) 507

v
EXHIBIT 10-2 Procure-to-Pay Data
Data: 10-2 P2P Data.zip - 551KB Zip / 594KB Excel

Microsoft or Tableau

Build a new dashboard (Tableau) or page (Power BI) called Financial that includes
the following:
1. Create a new workbook, connect to 10-2 P2P Data.xlsx, and import all eight
tables. Double-check the data model to ensure relationships are correctly defined.
2. Add a table to your page called Purchases that shows the invoice quarter in
each row and the invoice amount, payment amount, adjustment amount, bal-
ance, and write-off percentage in the columns. Tableau Hint: Use Measure
Names in the columns and Measure Values in the marks to create your table.
Then once your table is complete, use Analytics > Summarize > Totals to calcu-
late column totals.
a. You will need to create a new measure showing the account AP Balance,
or the total invoice amount minus the total payment amount

ISTUDY
508 Chapter 10 Project Chapter (Basic)

minus adjustments. Tableau Hint: To minimize errors in Tableau, use


the IFNULL() function to replace blank values with 0, for example,
IFNULL([Receipt Amount],0).
b. Filter this visual to show only values from Q1 to Q4.
3. Now create a clustered bar chart called Forfeited Discounts to analyze discounts
by supplier account name, sorted by discounts available. Stack discounts
available with discounts taken, then add some tooltips to show the discounts
forfeited and forfeited ratio. You will need the following measures/calculated
fields:
a. Discounts Available is the total Invoice Amount multiplied by Terms
Discount Percentage.
b. Discounts Taken is the total of Invoice Amount minus the total of Payment
Amount. Tableau Hint: Use IIF(ISNULL([Payment Amount]),[Invoice
Amount], [Payment Amount]) to substitute the payment amount for null
values of unpaid invoices.
c. Discounts Forfeited is the difference between Discounts Available and
Discounts Taken.
d. Forfeited Ratio is Discounts Forfeited divided by Discounts Available.
4. Clean up the formatting and titles of your visuals and combine them into a
single dashboard or page.
5. Take a screenshot of your dashboard showing the account balances (label it 10-2A).
6. Save your workbook as 10-2 P2P Analysis, answer the lab questions, then
continue to Part 2.

QS2 Part 1 Objective Questions


OQ1. What are the total purchases for Q3?
OQ2. Which quarter had the highest total purchases?
OQ3. What is the total amount of cash paid to suppliers for 2020?
OQ4. What is the balance for accounts payable that would appear on the balance
sheet?
OQ5. How much discount (in dollars) has the company forfeited from the top three
suppliers?

QS2 Part 1 Analysis Questions


AQ1. Why do you think there are still balances from the beginning of the year?
AQ2. What would you expect to happen with outstanding balances in Q1 and Q2?
AQ3. Is it reasonable to have outstanding balances in Q3? Why or why not?
AQ4. When might a large company prefer to forfeit discounts on its invoices?
AQ5. Should the company adjust its policy to pay suppliers more quickly, in your
opinion?

ISTUDY
Chapter 10 Project Chapter (Basic) 509

QS2 Part 2 Managerial: How Long Is the Company


Taking to Pay Invoices?
For this part, put yourself in a manager’s shoes and create an analysis that will help answer
questions about the procure-to-pay process.

Microsoft or Tableau

Build a dashboard (Tableau) or page (Power BI) called Management with the following:
1. Add a filter to show only purchase orders from November.
2. Add a table to your page called Total Purchases by Day that shows the total pur-
chase order amount by purchase order date. Power BI Hint: Use the date hierar-
chy to drill down to specific days of the month. Tableau Hint: Set the purchase
order date to DAY() and place the total purchase order amount as a text mark.
3. Add a bar chart to your page called Purchases by Supplier that shows the total
purchase order amount by supplier account name in descending order by amount.
4. Add a new matrix to your page called AP by Supplier that shows the supplier
and invoices in rows, and earliest invoice due date, age, and balance as values.
a. Create a parameter showing the Report Date as 12/31/2020. Power
BI Hint: Create a new column and use the DATE() function.
b. Create a new measure showing the Age as the difference between the
Invoice Due Date and the Report Date. Power BI Hint: Use the
DATEDIFF() function to calculate the age and the MIN() function on
the date fields to load specific dates.
c. Create a new measure showing the account Balance, or the total invoice
amount minus the total payment amount minus the total adjustment amount.
Tableau Hint: To minimize errors in Tableau, use the IFNULL() function to
replace blank values with 0, for example, IFNULL([Receipt Amount],0).
d. Filter the table to show only outstanding balances that are greater than 0.
5. Add a new card to your page called Days Payables Outstanding to show the
current KPI value. Hint: Create a new measure showing the DPO as the
accounts payable balance divided by the total purchases amount multiplied
by 30 days.
6. In Tableau, combine all of these visuals into one dashboard.
7. Take a screenshot of your dashboard (label it 10-2B).
8. Save your workbook, answer the lab questions, then continue to Part 3.

QS2 Part 2 Objective Questions


OQ1. Which supplier did the company purchase the most from?
OQ2. How much did the company purchase from that supplier?
OQ3. What is the invoice amount of the first outstanding invoice for supplier Danbam?
OQ4. What is the age of the first outstanding invoice?

ISTUDY
510 Chapter 10 Project Chapter (Basic)

QS2 Part 2 Analysis Questions


AQ1. Some of the accounts have a negative age. What does that mean?
AQ2. What is the current days payable outstanding KPI value? What does it mean?
AQ3. What risks are present if you take too long to pay our accounts payable?
AQ4. What are some analyses you could perform that would provide insight into
how efficiently your company is processing payments to suppliers?
AQ5. Are there any KPIs that would be appropriate here?
AQ6. In your opinion, what would be an appropriate benchmark for the average
number of discount dollars lost as a percentage of available discount dollars?
AQ7. How about erroneous payments as a percentage of total payments?
AQ8. Would management want these numbers to be higher or lower?

QS2 Part 3 Audit: Are There Any Erroneous Payments?


Auditors would be interested in evaluating the origin of invoices and payables to make sure
they are paid correctly and aren’t out of normal behavior. Before you continue your analy-
sis, answer the following questions about the payment process.
Let’s look at the data and use an average Z-score value to determine which suppliers are
receiving an abnormally high amount of purchases.

Microsoft or Tableau

Build a new dashboard (Tableau) or page (Power BI) called Audit with the following:
1. Add a new bar chart called Outliers that shows the Z-score of purchases
by supplier account name. You will need to create the following measures/
calculated fields:
a. Average Purchases shows the average purchase order amount. Tableau
Hint: Use WINDOW_AVG(SUM([Purchase_Order_Amount_Local])) to
aggregate by supplier.
b. Std Dev Purchases shows the standard deviation of purchase order
amount. Tableau Hint: Use the WINDOW_STDEVP() formula. Power BI
Hint: Use Std Dev for the population.
c. Z-Score Purchases is calculated by dividing the total purchase order
amount minus average purchases by the Std Dev Purchases.
2. Now create a new worksheet called Missing Orders to determine if any
invoices have been received that don’t match existing orders.
a. In this case, you want to filter the Purchase Order ID from the Invoice
Received table to show only missing (null) values.
3. Take a screenshot of your dashboard showing exceptions (label it 10-2C).
4. Save your workbook, answer the lab questions, then continue to Part 4.

ISTUDY
Chapter 10 Project Chapter (Basic) 511

QS2 Part 3 Objective Questions


OQ1. Looking at the outliers visual, which supplier(s) had an abnormally high dollar
amount of purchases?
OQ2. Looking at the missing orders visual, which of the invoices received warrants
further investigation by the auditors?

QS2 Part 3 Analysis Questions


AQ1. What do you consider “abnormally high”? Are these suspicious?
AQ2. What statistical tools can the company use to diagnose behavior that is outside
of normal behavior?
AQ3. How might an outlier be used to focus the auditors on high-risk transactions?
AQ4. Why aren’t the other purchases suspicious?
AQ5. While you still have your auditor hat on, what are some additional analyses you
could perform to understand whether the purchase process is being followed
or controls are functioning properly?

QS2 Part 4 What Else Can You Determine about


the P2P Process?
We’ve discussed a few different ways to analyze P2P data to understand the processes and
controls. Now it’s your chance to find answers to your own questions.
Identify five questions that you think management, auditors, or financial analysts would
want to know about the P2P process. If you need help, search for some common questions
asked by accountants on the Internet.
Using the data you have already loaded into Power BI or Tableau, generate at least three
visualizations that will help you find the answers to your five questions. Load them into a
report or dashboard, and take a screenshot of your dashboard analyses (label it 10-2D).

QS2 Part 4 Analysis Questions


AQ1. Write your first question and provide an answer based on your analysis.
AQ2. Write your second question and provide an answer based on your analysis.
AQ3. Write your third question and provide an answer based on your analysis.
AQ4. Write your fourth question and provide an answer based on your analysis.
AQ5. Write your fifth question and provide an answer based on your analysis.

QS2 Submit Your Screenshot Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot document to Connect or to the location indicated by your instructor.

ISTUDY
Chapter 11
Project Chapter (Advanced):
Analyzing Dillard’s Data to
Predict Sales Returns

A Look at This Chapter


Similar to the previous chapter, this chapter will take you through a series of problems to help you analyze and com-
municate answers to typical accounting questions related to predicting sales returns. To answer these questions, we
will return to the Dillard’s dataset found on the University of Arkansas Remote Desktop to explore the data as they
relate to understanding and predicting sales returns. After exploring the data, we will work through a series of ques-
tions to describe the state of sales returns, diagnose why the returns are the way they are, and predict future sales
returns based on a selection of explanatory variables. This will provide a review of the Data Analytics concepts we’ve
discussed in the previous chapters and put them into perspective.

A Look Back
Chapter 10 had a project chapter that emphasized basic Data Analytics skills related to the order-to-cash and
purchase-to-pay processes.

512

ISTUDY
Americans returned $260 billion in merchandise to retailers
in 2015. While the average rate of return at retailers is just
8 percent, it increases on average to 10 percent during the
holiday sales returns. However, it increases dramatically for
online sales—to 30 percent or higher—with clothing returns
from online sales hitting 40 percent. With a much higher
return rate, as online retailers such as Amazon continue to
increase their market share of total retail sales, it’s only going
to get worse.
What’s more? Not only is product being returned in
greater numbers, but the value of the unwanted and dam-
aged returns is greatly diminished:

Unwanted and damaged goods either get tossed


Chris Salata/ZUMA Press/Newscom out or sent through a lengthy chain of liquidators
and wholesalers, paying pennies on the dollar to the
retailer before eventually selling them to bargain-
hunting consumers.*

Because accountants are required to estimate sales returns (and the diminished value of returned items), and off-
set sales in the same period that the original sales are made, accountants need to establish a reasonable and hope-
fully reliable method to estimate such returns. This chapter establishes various descriptive, diagnostic, and predictive
analytics that may be used to help evaluate the estimate of sales returns.

*CNBC LLC.
Source: https://www.cnbc.com/2016/12/16/a-260-billion-ticking-time-bomb-the-costly-business-of-retail-returns.html,
accessed April 2019. https://www.forbes.com/sites/stevendennis/2018/02/14/the-ticking-time-bomb-of-e-commerce-
returns/#46d599754c7f (accessed April 2019).

OBJECTIVES
After completing this chapter, you should be able to:

LO 11-1 Analyze returned sales to find explanatory (independent) variables


using descriptive and exploratory analytics.
LO 11-2 Illustrate hypothesis testing using diagnostic analytics to compare and
contrast sales returns around the holiday season.
LO 11-3 Predict returned sales in future periods using predictive analytics.

513

ISTUDY
514 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

ESTIMATING SALES RETURNS


The recent revenue recognition standards1 increased the emphasis on valid, reasonable esti-
mates of sales returns matched to the same time period the original sale was made. Compa-
nies must assess whether their models and methods of estimating returns are appropriate.
In this chapter, we will work through a project to describe (Question Set 1), diagnose
(Question Set 2), and predict sales returns for Dillard’s (Question Set 3). Using these vari-
ous analyses, we develop a potential model useful in predicting Dillard’s sales returns and
test it for reasonableness.
Data: Dillard’s sales data are available only on the University of Arkansas Remote Desk-
top (waltonlab.uark.edu). See your instructor for login credentials.
There are three main question sets:
1. Question Set 1 focuses on exploring the sum of returns and the percentage of returned
sales by state, online versus in-person transactions, and month using Tableau data
visualizations.
2. Question Set 2 continues the analysis with hypothesis tests to see if the percentage of
sales returned is significantly higher during the holiday season (January) than any other
time of the year.
3. Question Set 3 focuses on exploring how historical data can help predict the future percent-
age of returned sales through PivotTables, PivotCharts, and regression testing in Excel.
Each question set has instructions to guide you through mastering the data, performing
the analysis, and communicating your results.

LO 11-1 QUESTION SET 1: DESCRIPTIVE AND


Analyze returned EXPLORATORY ANALYSIS
sales to find
explanatory In this question set, you will prepare the data for analysis in Power Query or Tableau Prep,
(independent) and then explore the data by calculating descriptive statistics and determining variables
variables using worth evaluating in your diagnostic analysis. In particular, we will analyze percentage of
descriptive and returned sales over time (i.e., 3-year period), across states, and the differences in percent-
exploratory ages of returned sales for online purchases versus in-person purchases.
analytics.

QS1 Part 1 Compare the Percentage of Returned Sales


across Months, States, and Online versus In-Person
Transactions
QS1 Part 1 Thought Questions
TQ1. Which month do you expect average percentage of returned sales to be the
highest? The lowest? Why?
TQ2. Do you think the trend over time of average percentage of returned sales for
online transactions is similar to or different from the average percentage of
returned sales for in-person transactions? Why?
TQ3. Do you expect percentage of returned sales to vary much across states? Why?

1
Accounting Standards Codification (ASC) 606, Revenue from Contracts with Customers, as amended,
and created by Accounting Standards Update (ASU) 2014-09, Revenue from Contracts with Customers.

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 515

Microsoft | Excel + Power Query

1. After logging in to the Remote Desktop, open Microsoft Excel and click the
Data tab on the ribbon.
2. Click Get Data > From Database > From SQL Server Database.
a. Server: essql1.walton.uark.edu
b. Database: WCOB_Dillards
c. Expand Advanced Options and input the following query:
SELECT MONTH(TRAN_DATE) AS MONTH, TRAN_DATE,
STATE, TRANSACT.STORE, TRAN_TYPE,
SUM(SALE_PRICE) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE
GROUP BY MONTH(TRAN_DATE), TRAN_DATE, STATE,
TRANSACT.STORE, TRAN_TYPE
d. Click OK (if prompted: Connect using current credentials, and click OK).
e. Click Edit or Transform Data.
3. There are several changes you need to make to the data to prepare them for
analysis: pivoting the tran_type column so we have two separate columns for
purchases and returns, creating a calculated field for percentage of returned
sales, and creating a conditional column indicating whether transactions
were performed online or in-person.
a. Pivot the tran_type column:
1. Select the TRAN_TYPE column.
2. From the Transform tab in the ribbon, select Pivot Column.
3. Values Column: Amount
4. Click OK.
b. Create a calculated field for percentage of returned sales:
1. From the Add Column tab in the ribbon, select Conditional Column.
2. New column name: % of Returned Sales
3. Column name: P
4. Operator: equals
5. Value: 0
6. Output: 0
7. Otherwise: = [R]/[P]
8. Click OK.
c. Create a conditional column for online versus in-person transactions:
1. From the Add Column tab in the ribbon, select Conditional Column.
2. New Column Name: Online-dummy
3. Column Name: STORE

ISTUDY
516 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

4. Operator: equals
5. Value: 698
6. Output: Online
7. Otherwise: In-Person
8. Click OK.
d. From the Home tab in the ribbon, select Close & Load.
e. Once your data load into Excel, name the spreadsheet Ch 11 Query
Data.
f. Once your data load into Excel, add the data to the data model through
Power Pivot and create a date table:
1. Enable the Power Pivot add-in (File > Options > Add-ins > Manage:
COM add-ins > PowerPivot, select GO, then enter a check mark
next to MS PowerPivot for Excel, click OK).
2. From the Power Pivot tab in the ribbon, select Add to Data Model.
3. From the Power Pivot window, select the Design tab > Date Table>
New.
4. Once the Date table populates, select PivotTable from the Home tab
and click OK to create a PivotTable in a new worksheet.
5. Name the spreadsheet with the new PivotTable Ch 11 QS1 PivotTable.
4. In Excel, create the following PivotTables or PivotCharts (if you create Pivot-
Tables, include relevant conditional formatting):
a. Average % of Returned Sales by month to indicate the months with the
highest and lowest averages.
• When using fields from the Calendar table and the query table, you
will need to build relationships. Once you add fields from each table,
Excel will prompt you to create relationships. Let Excel do this auto-
matically for you by clicking Auto-Detect... If Excel does not detect
the relationship, you can build it manually. The matching fields are
Date Table.Date and Query1.TRAN_DATE.
b. Average % of Returned Sales for online transactions versus in-person
transactions.
c. Average % of Returned Sales across states.
5. Add three slicers and adjust the report connections for each so that they
interact and slice each of your PivotTables or PivotCharts:
a. Month
b. Online-Dummy
c. State
6. Arrange your Slicers, PivotTables, and/or PivotCharts so that they are all eas-
ily viewable on your Excel sheet.
7. Take a screenshot of your PivotTables, PivotCharts, and Slicers (label it 11 -1M).
8. Save your workbook as Chapter11.xlsx, answer the QS questions, then continue
to Question Set 1, Part 2.

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 517

Tableau | Prep

1. Open Tableau Desktop, then open Tableau Prep Builder.


2. In Tableau Prep, click Connect to Data.
3. Choose Microsoft SQL Server in the Connect list.
4. Enter the following and click Sign In:
a. Server: essql1.walton.uark.edu
b. Database: WCOB_DILLARDS
5. Double-click Custom SQL to input the following SQL query:
SELECT TRAN_DATE, STATE, TRANSACT.STORE, TRAN_TYPE,
SUM(SALE_PRICE) AS AMOUNT
FROM TRANSACT
INNER JOIN STORE
ON TRANSACT.STORE = STORE.STORE
GROUP BY TRAN_DATE, STATE, TRANSACT.STORE, TRAN_TYPE
6. Click Run.
7. There are several changes you need to make to the data to prepare them for
analysis: pivoting the tran_type column so we have two separate columns for
purchases and returns, creating a calculated field for percentage of returned
sales, and creating a conditional column indicating whether transactions
were performed online or in-person.
8. Pivot the tran_type column:
a. Click the plus button next to the Custom SQL Input Icon and select Pivot.
b. In the Pivoted Fields window, change the Pivot type from Columns to
Rows to Rows to Columns (click the dropdown next to Columns to Rows).
c. Field that will pivot rows to columns: tran_type
d. Field to aggregate for new columns: amount
9. Create a calculated field for percentage of returned sales:
a. Click the plus button next to the Pivot Icon and select Clean Step.
b. Click Create Calculated Field. . .
c. Field Name: % of Returned Sales
d. Calculation: if [P] = 0 then 0 else [R]/[P] end
e. Click Save.
10. Create a conditional column for online versus in-person transactions:
a. Click Create Calculated Field. . .
b. Field Name: Online-dummy
c. Calculation: if [STORE] = 698 then ‘online’ else ‘in-person’ end
d. Click Save.
11. To save the file, click the plus button next to the Clean Icon and select Output.
a. Select Browse and save the file as Chapter11.hyper.
b. Click Run Flow.
c. Save the Tableau Prep file as Chapter11.tfl.

ISTUDY
518 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

12. Open Chapter11.hyper in Tableau Desktop and navigate to Sheet 1.


• If you saved your file to the default location, the path will be Documents
> My Tableau Prep Repository > Datasources.
13. In Tableau, create the following visualizations on separate sheets:
a. Average % of Returned Sales by month to indicate the months with the
highest and lowest averages.
b. Average % of Returned Sales for online transactions versus in-person
transactions.
c. Average % of Returned Sales across states.
d. Arrange each of the visualizations you created into a dashboard and set
each visualization as a filter for the dashboard.
14. Take a screenshot of your dashboard (label it 11-1T).
15. Save your workbook as Chapter11.twb, answer the lab questions, then con-
tinue to Question Set 1 Part 2.

QS1 Part 1 Objective Questions


OQ1. After executing these queries you can confirm whether your assumptions that
you made in the thought questions were accurate. Which month has the highest
average percentage of returned sales?
OQ2. Which month has the lowest average percentage of returned sales?
OQ3. What is the average percentage of returned sales for in-person transactions?
What is the average percentage of returned sales for online transactions?
Which one has a higher average percentage of returned sales?
OQ4. What is the average percentage of returned sales for Oklahoma (OK)?

QS1 Part 1 Analysis Questions


AQ1. Why did this exercise have you assess average percentage of returned sales and
not sum, which is the default aggregate in both Excel and Tableau?
AQ2. Use the slicer (Excel) or the visualization filter (Tableau) to assess for
seasonality in percentage of returned sales for online transactions versus
in-person transactions. What changes do you observe? Why do you think the
pattern is different?

QS1 Part 2 What Else Can You Determine about the


Percentage of Returned Sales through Descriptive
Analysis?
We’ve discussed a few different ways to analyze Dillard’s returned sales data. Now it’s your
chance to find answers to your own questions.
Identify five questions that you think Dillard’s management, auditors, or financial analysts
would want to know about returns that could be answered through descriptive analytics. If
you need help, search for some common questions asked by accountants on the Internet.
Using the data you have already loaded into Excel, Power BI, or Tableau (or feel free to
load new data!), generate at least three reports or visualizations that will help you find the

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 519

answers to your five questions. Load them into a report or dashboard and take a screen-
shot of your dashboard analyses (label it 11-2MB or 11-2TB).

QS1 Part 2 Analysis Questions


AQ1. Write your first question and provide an answer based on your analysis.
AQ2. Write your second question and provide an answer based on your analysis.
AQ3. Write your third question and provide an answer based on your analysis.
AQ4. Write your fourth question and provide an answer based on your analysis.
AQ5. Write your fifth question and provide an answer based on your analysis.

QS1 Submit Your Screenshot Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot document to Connect or to the location indicated by your instructor.

QUESTION SET 2: DIAGNOSTIC ANALYTICS— LO 11-2


HYPOTHESIS TESTING Illustrate
hypothesis testing
QS2 Part 1 Is the Percentage of Sales Returned using diagnostic
Significantly Higher in January after the Holiday Season? analytics to
compare and
In the previous activity for this dataset, you found that January has the highest percentage of contrast sales
sales returned of all the months. This is likely due to the holiday season. Now that you have that returns around the
knowledge based on your descriptive analysis, you can drill down and compare January to the holiday season.
rest of the year more specifically. In Excel, you will drill down into the data to determine if the
difference between the January percentage of returned sales and the rest of the year is statisti-
cally significant. In order to do so, you will run a hypothesis test to determine if a significantly
higher percentage of sales are returned during the month of January than the rest of the months.
Tableau does not offer the functionality to run statistical t-tests, but you can still create a
set and compare the January percentage of returned sales to the rest of the year.
QS2 Part 1 Thought Questions
TQ1. Why does the statistical significance matter? Would your analysis and
recommendations to Dillard’s be different if the difference is not statistically
significant?
TQ2. If the difference is statistically significant, how would you display the results
visually to Dillard’s?

Microsoft | Excel

1. Access the Power Query Editor to prepare your data for running a hypothesis test:
a. Open your Excel file saved as Chapter11.xlsx (from QS1) and select a
cell of the table in the Ch 11 Query Data sheet to activate the Query tab
in the Excel ribbon.
b. From the Query tab, select Edit to open the Power Query editor. If prompt-
ed to do so, click Edit Permissions and Run the Native Database Query in

ISTUDY
520 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

the window that pops up. Then repeat a similar process by clicking Edit
Credentials, clicking Connect, allowing Encryption Support, and click-
ing OK. The query data should show up in the Power Query Editor now.
2. You will perform three actions: duplicate the existing % of Returned Sales
column, add a new conditional column to create a holiday dummy variable,
and pivot your new conditional column on % of Returned Sales.
a. Duplicate the existing % of Returned Sales column:
1. Select the % of Returned Sales column.
2. From the Add Column tab in the ribbon, click Duplicate column.
b. Add a new conditional column for the holiday dummy variable:
1. From the Add Column tab in the ribbon, select Conditional Column.
a. New column name: Holiday-Dummy
b. Column Name: Month
c. Operator: Equals
d. Value: 1
e. Output: Holiday
f. Otherwise: Non-Holiday
g. Click OK.
2. Pivot the Holiday-Dummy column:
a. Select the Holiday-Dummy column.
b. From the Transform tab in the ribbon, click Pivot Column.
c. Values Column: % of Returned Sales - Copy and click OK.
3. Return to the Home tab in the ribbon and click Close & Load.
4. Once your data have loaded, you will run a t-test to see if the percentage of
sales returned in January is significantly higher than the rest of the year.
a. From the Data tab in the ribbon, click Data Analysis.
• If the Data Analysis Toolpak hasn’t been added in, see Appendix
C for directions on how to add it. Click Data Analysis to open the
Analysis Tools window.
b. Select t-Test: Two Sample Assuming Unequal Variances, click OK, then
enter the following:
1. Variable 1 range: all Holiday values (including the label)
2. Variable 2 range: all Non-Holiday values (including the label)
3. Place a check mark next to Labels.
4. Output options: New Worksheet Ply
5. Click OK.
5. The output for the hypothesis test will appear on a new sheet in your Excel
workbook. Name this sheet Ch 11 QS2 t-test.
6. Take a screenshot of your hypothesis test results (label it 11-2MA).
7. Save your workbook as Chapter11.xlsx, answer the lab questions, then
continue to Part 2.

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 521

Tableau | Desktop

1. While you cannot run a t-test in Tableau, you can drill down further into the
data to create a Holiday set and compare the % of returned sales during Janu-
ary versus the rest of the year. To do so, you will add Month to the visualiza-
tion, select January, create a set named Holiday, then create visualizations
using the new Holiday set.
2. Create a new sheet in your Chapter 11.twb tableau file and name it Holiday
Diagnostic Analysis.
a. Drag TRAN_DATE to the Rows shelf.
b. Right-click the YEAR(TRAN_DATE) pill in the Rows shelf and se-
lect Month.
c. In the visualization, select January and select Set button (looks like two
overlapping circles).
d. Select Create Set. . . and Name it Holiday.
e. Replace the MONTH(TRAN_DATE) pill in the Rows shelf with Holiday
and drag % of Returned Sales to the Columns shelf.
f. Adjust the Measure for SUM(% of Returned Sales) to Average.
3. Take a screenshot of your Holiday Diagnostic Analysis visualization
(label it 11-2TA).
4. Save your workbook as Chapter11.twb, answer the lab questions, then con-
tinue to Part 2.

QS2 Part 1 Objective Questions


OQ1. Microsoft Excel Only: Based on the p-values (or the t-statistic and critical val-
ues), are the returns as a percentage of sales in January significantly greater, less
than, or the same as the returns as a percentage of sales for the rest of the year?
OQ2. What is the average percentage of returned sales for the non-holiday time period?

QS2 Part 1 Analysis Questions


AQ1. What additional data would be interesting to add to your diagnostic analysis
about the percentage of returned sales?
AQ2. What would your recommendations be to Dillard’s regarding the differences in
January returns and the rest of the year?

QS2 Part 2 How Do the Percentages of Returned Sales


for Holiday/Non-Holiday Differ for Online Transactions
and across Different States?
In Part 1, you tested the percentage of returned sales for holiday/non-holiday differences
for significance (Excel) and began creating a visualization (Tableau). In Part 2 of this ques-
tion set, you will drill down into the data to further diagnose the differences in holiday and
non-holiday percentage of returned sales. For the Microsoft track, you will switch to Power

ISTUDY
522 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

BI for this question set. For the Tableau track, you can continue with the same Tableau
workbook you used in the previous question set.

QS2 Part 2 Thought Questions


TQ1. Do you expect January to be the month with the highest average return of sales
for online transactions? Why?
TQ2. If there is at least one state that does not fit the pattern of having a higher
average percentage of returned sales in January than the rest of the year, what
would your next diagnostic analysis steps be to determine why there is an
anomaly?

Microsoft | Power BI Desktop

1. Open Power BI Desktop and connect to your Chapter 11.xlsx Excel workbook.
2. Select the Ch 11 Query Data sheet and click Load.
3. Expand Ch 11 Query Data in the Fields window and click through the fol-
lowing attributes one at a time. When you click an attribute, a new tab in the
ribbon appears named Column tools. Click into that tab and ensure that the
following data types are set correctly.
a. Holiday: Decimal Number
b. Non-Holiday: Decimal Number
c. Online-Dummy: Text
4. You will create three visualizations next, one to compare average holiday to
non-holiday % of returned sales, overall, another one to break it down by
online versus in-person, and a third to break it down by state.
a. Visualization one: Clustered Column Chart
1. Value: Holiday and Non-Holiday
2. Adjust the Value fields aggregations from SUM to AVERAGE.
b. Visualization two: Clustered Column Chart (this is easiest if you copy and
paste your first visualization)
1. Value: Holiday and Non-Holiday(ensure the aggregate value is
AVERAGE for both)
2. Axis: Online-Dummy
c. Visualization three: Clustered Column Chart (this is easiest if you copy
and paste the first visualization)
1. Value: Holiday and Non-Holiday(ensure the aggregate value is
AVERAGE for both)
2. Axis: State
d. Rearrange and resize the visualizations so that they are easier to read and
take a screenshot of your dashboard (label it 11-2MB).

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 523

Tableau | Desktop
From your Chapter 11.twb tableau file, you will duplicate the Holiday Diagnostic
Analysis visualization to create two additional visualizations, and then combine all
three in a dashboard. The two additional visualizations will show a breakdown of
holiday/non-holiday returns for online versus in-person transactions and a break-
down of holiday/non-holiday returns across states.
1. Duplicate the Holiday Diagnostic Analysis sheet and rename the duplicate
sheet Holiday Diagnostic Analysis - Online.
2. Drag Online-Dummy to the right of the IN/OUT(Holiday) pill in Rows.
3. Duplicate the Holiday Diagnostic Analysis sheet again and rename the dupli-
cate sheet Holiday Diagnostic Analysis - States.
4. Drag State to the right of the IN/OUT(Holiday) pill in Rows.
5. Create a new dashboard and arrange all three holiday diagnostic analysis
sheets so that they are easy to read and take a screenshot of your dash-
board and label it 11-2TB.

QS2 Part 2 Objective Questions


OQ1. What is the average percentage of returned sales for non-holiday online transactions?
OQ2. What is the average percentage of returned sales for Iowa (IA) holiday transactions?
OQ3. What is the average percentage of returned sales for Wyoming (WY)
non-holiday transactions?
QS2 Part 2 Analysis Questions
AQ1. Provide three insights about the percentage of returned sales that you can derive
from the hypothesis test in Part 1 and/or the dashboards you created in Part 2.

QS2 Part 3 What Else Can You Determine about the


Percentage of Returned Sales through Diagnostic
Analysis?
We’ve discussed a few different ways to analyze Dillard’s returned sales data. Now it’s your
chance to find answers to your own questions.
Identify five questions that you think Dillard’s management, auditors, or financial ana-
lysts would want to know about returns. If you need help, search for some common ques-
tions asked by accountants on the Internet.
Using the data you have already loaded into Excel, Power BI, or Tableau (or feel free to
load new data!), generate at least three reports (could include a different hypothesis test) or
visualizations that will help you find the answers to your five questions. Load them into a report
or dashboard and take a screenshot of your dashboard analyses (label it 11-2MC or 11-2TC).
QS2 Part 3 Analysis Questions
AQ1. Write your first question and provide an answer based on your analysis.
AQ2. Write your second question and provide an answer based on your analysis.
AQ3. Write your third question and provide an answer based on your analysis.

ISTUDY
524 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

AQ4. Write your fourth question and provide an answer based on your analysis.
AQ5. Write your fifth question and provide an answer based on your analysis.

QS2 Submit Your Screenshot Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot document to Connect or to the location indicated by your instructor.

LO 11-3 QUESTION SET 3: PREDICTIVE ANALYTICS


Predict returned
sales in future
QS3 Part 1 By Looking at Line Charts for 2014 and
periods using 2015, Does the Average Percentage of Sales Returned
predictive in 2014 Seem to Be Predictive of Returns in 2015?
analytics.
After assessing how different variables impact returns, you have a better idea of how to help
Dillard’s prepare for returns—both across states and for the holiday season. You can also get
an idea of how much a previous year’s percentage of returned sales can help predict the next
year’s. To answer this question, you will create a line chart to compare years 2014 and 2015,
then run a regression analysis to build a predictive model. For the Microsoft track, you will
return to Excel. For the Tableau track, you will continue working with the same Tableau
workbook from the previous two question sets, and only the first part of this question set
applies. The first part of this lab is very similar to what you worked on in Lab 7-3, but this
time we are using the line charts to help inform our next steps with predictive analytics.

QS3 Thought Questions


TQ1. How does it help companies to rely on seasonality and past performance if
those items exist in their data?
TQ2. What unforeseen factors could impact seasonality that managers would need to
consider instead of solely relying on the results from their predictive analytics?

Microsoft | Excel

1. Create a PivotTable and a PivotChart to compare 2014 and 2015 Returns.


a. Open the Chapter 11.xlsx Excel workbook.
b. Create a PivotTable from the dataset.
c. To view a comparison of how the average percentage of sales returned
changed each year, we want to view the years as columns and see a row
for each month’s average percentage of sales returned for either year.
This is easiest using the date parts from the Calendar Table (located in
the More fields. . .) section.
d. Drag Month to the Rows.
e. Drag Year to Columns.
f. Because 2016 does not include the full year, remove 2016 from the Pivot-
Table (either by clicking Column Labels or by inserting a Slicer).
g. Drag % of Sales Returned to Values.

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 525

h. Change the aggregate for % of Sales Returned from Sum to Average.


• You can adjust the format of these numbers to make them easier to
read by changing from decimal to percentage. There are many ways to
do this in Excel; one quick way is to select all of the values you wish to
adjust the format for, and then select the % icon from the Home tab.
i. Insert a PivotChart and change the default to Line, then click OK.
j. Add a slicer for State.
k. Take a screenshot (label it 11-3MA).
l. Answer QS3 Part 1 questions, then continue to QS3 Part 2.

Tableau | Desktop
Create a Line Chart to compare 2014 and 2015 Returns.
a. Create a new sheet in your Chapter 11.twb Tableau workbook and name it
Line Chart.
• If you are reopening Tableau, open a new Tableau workbook first, then
navigate to open the Chapter11.twb file.
b. Columns: TRAN_DATE
c. Rows: % of Returned Sales
d. Expand the YEAR(TRAN_DATE) pill to show Quarters.
e. Right-click QUARTER(TRAN_DATE) and select Month.
f. Drag TRAN_DATE to Color on the Marks card.
g. Because 2016 does not include the full year, right-click the YEAR(TRAN_
DATE) pill in the Marks card and select Filter. . ., then uncheck the box next
to 2016 and click OK.
h. Right-click STATE and select Show Filter.
i. Hide the Show Me window to view the state filter card.
j. Take a screenshot (label it 11-3TA).
k. Answer QS3 Part 1 questions, then continue to QS3 Part 2.

QS3 Part 1 Objective Questions


OQ1. Based on the line chart, does 2014 seem to be useful in predicting 2015’s per-
centage of sales returned?
OQ2. Click through each state and determine which state has the most variability in
percentage of sales (do not consider “U,” which stands for undefined)?

QS3 Part 1 Analysis Questions


AQ1. Based on these line charts, does it seem more beneficial to predict percentage of
sales returned on a state-by-state basis or in aggregate form?

ISTUDY
526 Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns

QS3 Part 2 Using Regression, Can We Predict Future


Returns as a Percentage of Sales Based on Historical
Transactions?

Microsoft | Excel

1. Because the line graphs seemed to suggest that previous transactions would
help predict future transactions, we can run a regression to build a model that
will help stores predict the percentage of sales that will be returned each month.
a. Create a PivotTable in your Chapter 11.xlsx worksheet.
1. Columns: Year (from Date Table > More Fields. . .)
2. Rows: Month then Day of Week (from Date Table > More Fields. . .)
3. Values: % of Returned Sales(change the aggregate value to Average
from Sum)
b. Adjust PivotTable settings:
1. From the PivotTable Design tab in the ribbon, select Report Layout>
Show in Tabular Form.
2. From the PivotTable Design tab in the ribbon, select Grand Totals >
Off for Rows and Columns.
2. From the Data tab in the ribbon, click Data Analysis.
• If the Data Analysis Toolpak hasn’t been added in, see Appendix C for
directions on how to add it. Click Data Analysis to open the Analysis
Tools window.
3. Select Regression, click OK, then enter the following:
a. Input Y Range: all 2015 values (including the label)
b. Input X Range: all 2014 values (including the label)
c. Place a check mark next to Labels.
d. Click OK.
e. Take a screenshot (label it 11-3MB).
4. Answer QS3 Part 2 questions, then continue to QS3 Part 3.

Tableau | Desktop
This portion of the question set cannot be completed in Tableau.

QS3 Part 2 Objective Questions


OQ1. What is the R Square?
OQ2. What is the coefficient for 2014?
OQ3. Does the R Square give you confidence that this is a strong predictive model?

ISTUDY
Chapter 11 Project Chapter (Advanced): Analyzing Dillard’s Data to Predict Sales Returns 527

QS3 Part 2 Analysis Questions


AQ1. Looking at your regression output, was the relationship between 2014 and 2015
percentage of sales returned significant? How can you tell?
AQ2. What are the drawbacks of using one year of data to forecast a future year?
AQ3. Brainstorm at least four other data items (e.g., economy, type of customer, etc.)
that would be helpful in predicting the next year’s percentage of sales returned.
AQ4. What have you learned from completing the analyses in Question Sets 1, 2, and 3?

QS3 Part 3 What Else Can You Determine about the


Percentage of Returned Sales through Predictive
Analysis?
We’ve discussed a few different ways to predict Dillard’s returned sales data. Now it’s your
chance to find answers to your own questions.
Identify five questions that you think Dillard’s management, auditors, or financial ana-
lysts would want to know about returns to help them predict future returned sales. If you
need help, search for some common questions asked by accountants on the Internet.
Using the data you have already loaded into Excel, Power BI, or Tableau (or feel free to
load new data!), generate at least three reports (could include a different regression) or visual-
izations that will help you find the answers to your five questions. Load them into a report or
dashboard and take a screenshot of your dashboard analyses (label it 11-3MC or 11-3TB).

QS3 Part 3 Analysis Questions


AQ1. Write your first question and provide an answer based on your analysis.
AQ2. Write your second question and provide an answer based on your analysis.
AQ3. Write your third question and provide an answer based on your analysis.
AQ4. Write your fourth question and provide an answer based on your analysis.
AQ5. Write your fifth question and provide an answer based on your analysis.

QS3 Submit Your Screenshot Document


Verify that you have answered any questions your instructor has assigned, then upload your
screenshot document to Connect or to the location indicated by your instructor.

ISTUDY
Appendix A
Basic Statistics Tutorial

POPULATION VS. SAMPLE


Restaurant stores and retail stores are often faced with the decision of whether to stay open on
Sunday. Like Chick-fil-A, retail owners and restaurant owners sometimes like to close on Sunday
to allow their employees to spend time with families or simply to take a break for the day.
What percentage of the restaurants and retail owners close on Sunday? We’d love to ask
a survey question on SurveyMonkey or Qualtrics and get every retail/restaurant owner to
respond. If we could get every response from every restaurant/retail owner, we’d call this
the results of the population, defined as the total set of observations. Because it is virtually
impossible to get every owner to respond to our survey, we often draw a random sample
(defined as a subset of the data collected from the population) expecting that the results we
find from that sample are representative of what we would find had we been able to get the
total population to respond.
In the past, it typically wasn’t financially feasible (cost vs. benefit to reducing risk) to
do full population testing so auditors sampled accounting transactions. But with new high-
powered analytics tools as well as having accountants trained to use such tools, auditors are
increasingly able to consider data from the full population instead of a small sample!

PARAMETERS VS. STATISTICS:


WHAT IS THE DIFFERENCE?
Whereas a parameter comes from a population, a statistic comes from a sample. For example,
the population average (or mean) would be the parameter we would call the greek letter mu
(µ). For example, the population average of stores closed on Sunday might be 24 percent.
However, since we’re only able to survey a sample, the result of surveying the sample would
be the sample statistic average x-bar, or ¯x​​
​​  . If we don’t know the true population ­average, µ,
we will use the sample average to make inferences about the true population average.

DESCRIBING THE SAMPLE BY ITS


CENTRAL TENDENCY, THE MIDDLE,
OR MOST TYPICAL VALUE
To learn more about a sample, we often use measures of the central, the middle, or most
typical value to describe the sample. The mean, median, and mode are three common mea-
sures used to assess central tendency.

528

ISTUDY
Appendix A Basic Statistics Tutorial 529

The sample arithmetic mean is the sum of all the data points divided by the number of
observations. The median is the midpoint of the data and is especially useful when there
are skewed numbers one way or another. The mode is the observation that occurs most
frequently.

DESCRIBING THE SPREAD (OR


VARIABILITY) OF THE DATA
The next step after describing the central tendency of the data is to assess its spread, or
variability. This might include considering the maximum and minimum values and the dif-
ference between those two values, which we define as the range.
The most common measures of spread or variability is standard deviation or variance,
where each ith observation in the sample is xi, and the total number of observations is N.
The standard deviation, the greek letter sigma, σ, is computed as follows:
____________


N
​1 ​​∑ ​​(​(x​i​−  
​σ = ​ __ x​¯​))​2​
N i=1

And relatedly, the variance, σ2 is computed as follows:


N
​1 ​​∑ ​​(​(x​i​− x​¯​))​2​
​σ​2​= __
N i=1

The greater the sample standard deviation or variance, the greater the variability.

PROBABILITY DISTRIBUTIONS
There are three primary probability distributions used in statistics and Data Analytics,
including normal distribution, the uniform distribution, and the poisson distribution.

Normal Distribution
A normal distribution is arguably the most important probability distribution because it fits
so many naturally occurring phenomenon in and out of accounting—from the distribution
of return on assets to the IQ of the human population.
The normal distribution is a bell-shaped probability distribution that is symmetric about
its mean, with the data points closer to the mean more frequent than those data points fur-
ther from its mean. As shown in Exhibit A-1, data within one standard deviation (+/− one
standard deviation) includes 68 percent of the data points. Within two standard deviations,
95 percent of the data points; three standard deviations, 99.7 percent of the data points.
A Z-score is computed to tell us how many standard deviations (σ), a data point (or
observation), xi, is from its population mean, µ, using the formula z = (xi − µ)/σ. A Z-score

ISTUDY
530 Appendix A Basic Statistics Tutorial

of 1 suggests that the observation is one standard deviation above its mean. A Z-score of –2
suggests that the observation is two standard deviations below its mean.
Many of the statistical tests employed in data analysis are based on the normal distribu-
tion and how many standard deviations a sample observation is from its mean.

EXHIBIT A-1
Normal Distribution
and the Frequency of
Observations around Its
Mean (Using 1, 2, or 3
Standard Deviations)

68% of data

95% of data

99.7% of data

–3 –2 –1 μ 1 2 3
Number of Standard Deviations from the Mean

Uniform Distribution and Poisson Distribution


The uniform distribution is a probability distribution where all outcomes are equally likely.
Like in a fair coin toss, the distribution of heads and tails are equally likely. A deck of cards
has an equal distribution of hearts, clubs, diamonds, or spades. Likewise, a deck of cards has
an equal distribution of queens and 3s. A poisson distribution is a distribution with a low
mean and highly skewed to the right. An example of the poisson distribution might be the
number of patients that arrive at the emergency room between 2 and 3 a.m.

HYPOTHESIS TESTING
As we learn in Data Analytics, data by themselves are not really that interesting. It is using data
to answer, or at least address, questions posed by management that makes them interesting.
Management might pose a question in terms of a hypothesis, like their belief that sales
at their stores are higher on Saturdays than on Sundays. Perhaps they want to know this
answer to decide if they will need more staff to support sales (e.g., cashiers, shelf stockers,
parking lot attendants) on Saturday as compared to Sunday. In other words, management
holds an assumption that sales are higher on Saturday than on Sundays.
Usually hypotheses are paired in two’s: the null hypothesis and the alternate hypothesis.
The first is the base case, often called the null hypothesis, and assumes the hypothesized
relationship does not exist. In this case, the null hypothesis would be stated as follows:
Null hypothesis: H0: Sales on Saturday are less than or equal to sales on Sunday.
The alternate hypothesis would be the case that management believes to be true:
Alternate hypothesis: HA: Sales on Saturday are greater than sales on Sunday.
For the null hypothesis to hold, we would assume that Saturday sales are the same as (or
less than) Sunday sales. Evidence for the alternate hypothesis occurs when the null hypothesis

ISTUDY
Appendix A Basic Statistics Tutorial 531

does not hold and is rejected at some level of statistical significance. In other words, before we
can reject or fail to reject the null hypothesis, we need to do a statistical test of the data with
sales on Saturdays and Sundays and then interpret the results of that statistical test.

STATISTICAL TESTING
There are two types of results from a statistical test of hypotheses: the p-value and/or the
critical values.

The p-Value
We describe a finding as statistically significant by interpreting the p-value.
A statistical test of a hypothesis returns a p-value. The p-value is the result of a test that
either rejects or fails to reject the null hypothesis. The p-value is compared to a threshold
value, called the significance level (or alpha). A common value used for alpha is 5 percent
or 0.05 (as is 1 percent or 0.01).
The p-value is compared to the alpha threshold. A result is statistically significant when
the p-value is less than alpha. This signifies a change was detected: that the default hypoth-
esis can be rejected.
If p-value > alpha: Fail to reject the null hypothesis (i.e., not significant result).
If p-value <= alpha: Reject the null hypothesis (i.e., significant result).
For example, if we were performing a test of whether Saturday sales were greater than
Sunday sales and the test statistic was a p-value of 0.09, we would state something like, “The
test found that the Saturday sales are not different than Sunday sales, failing to reject the
null hypothesis at a 5 percent level of significance.”
This statistical result should then be reported to management.

EXHIBIT A-2
Statistical Testing
(1 – α) = 0.95 Using Alpha, p-Values,
and Confidence
Reject H0 Intervals
Fail to Fail to
reject H0 reject H0 α = .05

No difference
Negative difference Positive difference
between
between Saturday between Saturday and
Saturday and
and Sunday sales Sunday sales
Sunday sales

The Confidence Interval


The significance level can be computed by subtracting alpha (α) from 1 to give a confidence
level of the hypothesis given the statistical test of the data.
For example, if the confidence level is 95 percent, then alpha is 5 percent. In Exhibit A-2, the
95 percent portion of the figure represents the confidence interval—we are 95 percent confident
that the true population parameter of Saturday and Sunday sales falls somewhere in that area.
Therefore, statements such as the following can also be made:
With a p-value of 0.09, the test found that Saturday and Sunday sales are not different
than Sunday sales, failing to reject the null hypothesis at a 95 percent confidence level.
This statistical result should then be reported to management.

ISTUDY
532 Appendix A Basic Statistics Tutorial

INTERPRETING THE STATISTICAL OUTPUT


FROM A SAMPLE t-TEST OF A DIFFERENCE OF
MEANS OF TWO GROUPS
A sample t-test is a statistical test used to compare the means of two sets of data observations.
For example, it might be comparing means of two independent groups. For example,
a t-test might be used to compare the mean return on asset (ROA) for companies in the
retail industry to the mean ROA for companies in the entertainment industry to see if one
is ­statistically higher than the other.
Or it could be a paired t-test of the same group of companies but at different times. Such
a t-test might compare the mean return on assets (ROA) for companies in the retail industry
in 2020 to their mean ROA in 2021.
A t-test also allows a comparison of means of one company over time. Let’s suppose
that a company is trying to understand if its rate of sales returns is higher around the
end-of-year holidays than at other times (non-holidays) during the year. To assess whether
the sales returns are different, a t-test is performed in Excel to see if there is a difference
of daily mean sales returns (as a percentage of total sales that day) between the holidays
and non-holiday periods. After performing the t-test, Excel returns the following statisti-
cal output:

Microsoft Excel

The t-test output found that the mean holiday sales returns over 1,167 days is 0.13
(or 13 percent) of sales, and the mean non-holiday sales returns are 0.119 (or 11.9 percent)
of sales. The question is if those two numbers are statistically different from each other. The
t Stat of 7.86 and the p-value [shown as “P(T<=t) one tail”] is 3.59E-15 (i.e., well below
0.01 percent), suggesting the two sample means are significantly different from each other.
The t-test output notes the difference in crucial p-values for a one-tailed t-test and a two-
tailed t-test. A one-tailed t-test is used if we hypothesize that holiday returns are significantly
greater (or significantly smaller) than non-holiday returns. A two-tailed t-test is used if we
don’t hypothesize holiday or non-holiday returns are greater or smaller than the other, only
that we expect the two sample means will be different from each other.

ISTUDY
Appendix A Basic Statistics Tutorial 533

INTERPRETING THE STATISTICAL OUTPUT


FROM A REGRESSION
Regressions are used to help measure the relationship between one output (or depen-
dent) variable and various inputs (or independent variables). We can think about this like an
algebraic equation where y is the dependent variable and x is the independent variable, where
y = f(x). As an example, we hypothesize a model where y (or college completion rate) =
f(factors potentially predicting college completion rate) including the independent variable
SAT score (SAT_AVG). In other words, we hypothesize that college completion rates (the
output or dependent variable) depend on SAT scores (or input or independent variable).
Through regression analysis, we can assess if the college completion rate is statistically
related to the SAT score.
Let’s suppose we are considering the relationship between SAT scores and the college
completion rate for first-time, full-time students at four-year institutions. Here are some
regression outputs considering the relationship (from Excel’s Data Analysis Toolpak).

Microsoft Excel

There are many things to note about the regression results. The first is that the overall
regression model did better than chance at predicting the college completion rate as shown
by the “F”-score. We note that by seeing the p-score representing “Significance F” result is
very small, almost zero, suggesting there is virtually zero probability that the completion
rate can be explained by a model with no independent variables than a model that has
independent variables. This is exactly the situation we want, suggesting we should be able to
identify a factor that explains completion rates.
There is another statistic used to measure how the overall regression model did at predicting
the dependent variable of completion rates. The adjusted R-squared is a value between 0 and 1.
An adjusted R-squared value of 0 represents no ability of the model to explain the dependent
variable, and an adjusted R-squared value of 1 represents perfect ability of the model to explain
the dependent variable. In this case, the adjusted R-squared value is 0.642, which represents a
reasonably high ability to explain the changes in the college completion rate.
The statistics also report that the SAT score (SAT_AVG) helps predict the completion
rate. This is shown by the “t Stat” that is greater than 2 (or less than –2) for SAT_AVG
(with t Stat of 47.74) and a p-value less than an alpha of 0.05 (as shown with the p-value of
1.564E-285). As expected, given the positive coefficient on SAT_AVG, the greater the SAT
score, the greater the college completion rate.

ISTUDY
Appendix B
Excel (Formatting, Sorting, Filtering,
and PivotTables)

BASIC FORMATTING OF AN INCOME STATEMENT


USING EXCEL FUNCTION SUM()
Suppose we want to put the following data into the appropriate income statement format:

Revenues 50000
Expenses
Cost of Goods Sold 20000
Research and Development Expenses 10000
Selling, General, and Administrative Expenses 10000
Interest Expense 3000

Required:
1. Add a comma as a 1000 separator for each number.
2. Insert the words Total Expenses below the list of expenses.
3. Calculate subtotal for Total Expenses using the SUM() command.
4. Insert a single bottom border under Interest Expense and under the Total Expenses
subtotal.
5. Insert the words Net Income, and calculate Net Income (Revenues – Total Expenses).
6. Format the top and bottom numbers of the column with a $ currency sign.
7. Insert a Bottom Double Border to underline the final Net Income total.

534

ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 535

Solution:
1. Open Appendix B Data.xlsx and access the sheet named “Income Statement
Formatting.”
2. Add a comma as a 1000 separator for each number.
Highlight the column with all of the numbers. Right-click and select Format Cells. . .
to open this dialog box:

Microsoft Excel 2016

Click on Number and set Decimal places to zero. Click on Use 1000 Separator (,)
and click OK.
3. Insert the words Total Expenses below the list of expenses.
Type Total Expenses at the bottom of the list of expenses.
4. Calculate subtotal for Total Expenses using the SUM() command.
Use the SUM() command to sum all of the expenses, as follows:

Microsoft Excel 2016

ISTUDY
536 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)

Here is the result:

Microsoft Excel

5. Insert a single bottom border under Interest Expense and under the Total Expenses
subtotal.
Use the icon indicated to add the bottom border.

Microsoft Excel 2016

6. Insert the words Net Income and calculate Net Income (Revenues – Total Expenses).
Type Net Income at the bottom of the spreadsheet. Calculate Net Income by insert-
ing the correct formula in the cell (here, =B2-B9):

Microsoft Excel 2016

ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 537

7. Format the top and bottom numbers of the column with a $ currency sign.
Right-click on each number and Format Cells, select Currency and no decimal
points, and click OK.

Microsoft Excel

8. Insert a Bottom Double Border to underline the final Net Income total.
Place your cursor on the cell containing Net Income (7,000). Then select Bottom
Double Border from the Font > Borders menu.
This is the final product:

Microsoft Excel

Basic Data Manipulation (Filters, Sorts, PivotTables)


1. Open Appendix B Data.xlsx and access the sheet named “Basic Data Manipulation.”
2. Look at the data.

ISTUDY
538 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)

Sorting the Data


3. Let’s sort the data. To do so, go to Data > Sort & Filter > Sort.

Microsoft Excel

4. Let’s sort by sales price from largest to smallest. Input Sales into the Sort by, select
Largest to Smallest in the dialog box, and select OK.

Microsoft Excel

ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 539

The highest sales price appears to be $140 for 50 pounds of apricots at a cost of $88.01.

Microsoft Excel

Looking down at the end of this list, we see that the lowest sales price appears to be
five pounds of bananas for $2.52.

(Level 3) Filtering the Data


Next, let’s filter the data to look only at the banana transactions.
5. Let’s sort the data. To do so, go to Data > Sort & Filter > Filter.
6. An upside-down triangle (or a chevron) will appear. Click the chevron in cell F1
(Description), click Select All to unselect all, and then select only the word Banana.
7. The resulting data should appear as follows:

Microsoft Excel 2016

8. Alternatively, we could filter based on date to get all transactions on 3/2/2021. We first
need to clear the filter in cell F1 by clicking on the Filter symbol and selecting Select All.
9. Click the chevron in cell D1 (Date of Sale), click Select All to unselect all, and then
select 2021, then March, then 2.

ISTUDY
540 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)

Microsoft Excel

(Level 4) PivotTables

Analytics Tool: Excel PivotTables

PivotTables allow you to quickly summarize large amounts of data. In Excel, click
Insert > PivotTable, choose your data source, then click the check mark next to or
drag your fields to the appropriate boxes in the PivotTable Fields pane to identify
filters, columns, rows, or values. You can easily move attributes from one pane to
another to quickly “pivot” your data. Here is a brief description of each section:
Rows: Show the main item of interest. You usually want master data here, such as
customers, products, or accounts.
Columns: Slice the data into categories or buckets. Most commonly, columns are
used for time (e.g., years, quarters, months, dates).
Values: This area represents the meat of your data. Any measure that you would like to
count, sum, average, or otherwise aggregate should be placed here. The aggregated
values will combine all records that match a given row and column.
Filters: Placing a field in the Filters area will allow you to filter the data based on that
field, but it will not show that field in the data. For example, if you wanted to filter
based on a date, but didn’t care to view a particular date, you could use this area of
the field list. With more recent versions of Excel, there are improved methods for filtering,
but this legacy feature is still functional.

10. Let’s compute the accumulated gross margin for bananas, apricots, and apples.
11. First, unclick the filter at Data > Sort & Filter > Filter by clicking on and unselecting Filter.
12. Next, let’s compute the gross margin for each line item in the invoice. In cell J1, input the
words Gross Margin. Underline it with a bottom border. In cell J2, input =H2-I2 and Enter.

Microsoft Excel

13. Copy the result from cell J2 to J3:J194.


14. Now it is time to use the PivotTable. Recall that a PivotTable summarizes selected
columns in a spreadsheet, but doesn’t change the spreadsheet itself. We are trying to
summarize the accumulated gross margin for bananas, apricots, and apples.

ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 541

15. Select Insert > Tables > PivotTable.


16. Make sure all data are selected as follows in Table/Range and select OK.

Microsoft Excel

17. The empty PivotTable will open up in a new worksheet, ready for the PivotTable analysis.
Drag [Description] from FIELD NAME into the Rows and [Gross Margin] from
FIELD NAME into ΣValues fields in the PivotTable. The ΣValues will default to “Sum
of Gross Margin”.
The resulting PivotTable will look like this:

Source: Microsoft Excel 2016

18. The analysis shows that the gross margin for apples is $140.39; for apricots, $78.02;
and for bananas, $77.08.

ISTUDY
542 Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables)

The VLOOKUP Function


One of Excel’s most useful tools for looking up data from two separate tables and providing
matching information based on related fields is the VLookup function.
To demonstrate the VLookup function, we will work with sales transactions and tax rate
data. Sometimes when you access or request sales transaction data, the tax rate may not
be included. In order to calculate the amount of tax owed for each transaction, we need to
match the state tax rate with the state the customer is from. This would be an arduous task
to do manually, particularly in datasets that are large. We can use Excel’s VLookup function
to match the state the customers are from with the state tax rate.
Open Excel File Appendix B Data.xlsx and access the sheet named “VLookup.”
The dataset contains information about sales transactions to different customers. This is
similar to the dataset that you used in Lab 2-2. There are 132 unique transactions, but 150
rows—this is because a sales order may consist of multiple produces sold.
The first table includes the store location, and the second table has the state and its tax
rate. The table below provides that data dictionary for the attributes in each table.
You may need to scroll to the right to see the sales tax table.

Data Dictionary:

Sales_Transactions Table
Sales_Order_ID: Unique identifier for each individual Sales Order
Sales_Order_Date: Date each sales order was placed
Sales_Order_Quantity_Sold: Quantity of each product sold on the transaction
Product_Description: Description of the product sold
Product_Sale_Price: Price of each product sold on the transaction
Store_Location: State in which the store is located
State Sales Tax Table
State: The state abbreviation
State_Tax_Rate: The tax rate for each state

There are two columns that match in these two tables: Store_Location (from the Sales_
Transactions table) and State (from the State Sales Tax table). These two tables are placed
next to each other to make the VLOOKUP function easier to manage. And note that the
State Sales Tax table is organized such that the value to look up (State) is to the left of the
value (sales tax rate) to find.
Step 1:
We will add a new column to the Sales_Transactions table to bring in the State_Sales_Tax
associated with every Customer_St listed in the transactions table.
1. Add a new column to the right of the Store_Location named State_Sales_Tax (cell G1).

Microsoft Excel

ISTUDY
Appendix B Excel (Formatting, Sorting, Filtering, and PivotTables) 543

In cell G2, we will create a VLOOKUP function. VLOOKUP functions have four
arguments:
• Cell_reference: The cell in the current table that has a match in the related table. In this
case, it is a reference to the row’s corresponding Store_Location. Excel will match that
state with the corresponding state in the State Sales Tax table.
• Table_array: An entire table reference to the table that contains the descriptive data
that you wish to be returned. In this case, it is the entire State Sales Tax table.
• Column_number: The number of the column (not the letter!) that contains the descrip-
tive data that you wish to be returned. In this case, State Sales Tax Rate is in the second
column of the State Sales Tax table, so we would type 2.
• TRUE or FALSE: There are two types of VLOOKUP functions, TRUE and FALSE.
TRUE is for looking up what Excel calls “approximate” data. In our case, we’ll use
FALSE. A FALSE VLOOKUP will only return matches when there is an exact match
between the two tables (whenever your data are relational structured data, a perfect
match should be easily discoverable). If True or False is not designated, TRUE is the
default. Also, True can also be represented as 1 and False as 0.
Step 2:
2. Type in the following function (using cell references will be easier than typing manually):

Microsoft Excel

3. Once you click Enter, the formula should copy all the way down—once again exhibiting
the benefits of working with Excel tables instead of ranges.

ISTUDY
Appendix C
Accessing the Excel Data Analysis
Toolpak
Excel offers a toolpak that helps perform much of the data analysis, called the Excel Data
Analysis Toolpak.
To run a correlation, form a histogram, run a regression, or perform other similar analy-
sis using the Excel Data Analysis Toolpak, we need to make sure our Analysis Toolpak
is loaded up by looking at the ribbon of Data > Analysis and seeing if the Data Analysis
Add-In has been installed.

Microsoft Excel

If it has not yet been added, go to File> Options > Add-Ins, select the Analysis Toolpak,
and select GO:

Microsoft Excel 2016

544

ISTUDY
Appendix C Accessing the Excel Data Analysis Toolpak 545

In the Add-ins window that appears, place a check mark next to Analysis ToolPak and
then click OK. This will add the Data Analysis ToolPak to the Data tab so you can perform
additional data analysis.

Microsoft Excel

To perform the additional data analysis, please select Data > Analysis > Data Analysis.
A dialog box will open.

Microsoft Excel 2016

In this textbook, we highlight the use of the following analysis tools:


• Correlation: To understand if and the extent to which variables are related to each other.
• Descriptive statistics: To understand the basic statistics, including the mean, standard
deviation, minimums, and maximums, of a dataset.
• Histogram: To understand the frequency of the data using a display of rectangles with
area proportional to the underlying frequency of the data.
• Regression: To understand the relation between specific dependent variable values and
independent variable inputs.
• t-Tests: To understand the probability of a difference in means between either two inde-
pendent samples or a paired sample through time.

ISTUDY
Appendix D
SQL Part 1

SQL can be used to create tables, delete records, or edit databases, but in Data Analytics,
we primarily use SQL to extract data from the database—that is, not to edit or manipulate
the data, but to create different views of the data to help us answer business questions. SQL
extraction queries are also referred to as SELECT queries, because they each begin with the
word SELECT.
Throughout this appendix, all the examples and the practice problems refer to Appendix
D Data.accdb. This is a very small database to help you immediately visualize what a que-
ry’s results would look like.

INTRODUCTION TO SQL CLAUSES


Every SQL SELECT query must have two key SQL clauses, SELECT and FROM.

Introduction to SELECT
SELECT indicates which attributes you wish to view. For example, the Customers table
contains a complete customer list with several descriptive attributes for each of the com-
pany’s customers. If you would like to see a full customer list, but you just want to see
FirstName, LastName, and State, you can just select those three attributes in the first line
of your query:
SELECT FirstName, LastName, State

Introduction to FROM
FROM lets the database management system know which table(s) contain(s) the attributes
that you are selecting. For instance, in the query begun previously, the three attributes in the
SELECT clause come from the Customers table. So that query can be completed with the
following FROM clause:
FROM Customers
Try putting that query all together to see the results:
SELECT FirstName, LastName, State
FROM Customers

546

ISTUDY
Appendix D SQL Part 1 547

This returns the result in Exhibit D-1:

EXHIBIT D-1
Microsoft Access 2016

If you wish to view the same three columns, but you want to see the LastName column
as the first column so that the results more closely resemble a phone book, you can change
the order of the attributes listed in your SELECT statement:
SELECT LastName, FirstName, State
FROM Customers
Now the query returns the same number of records, but with a different order of attri-
butes (columns), seen in Exhibit D-2:

EXHIBIT D-2
Microsoft Access 2016

Practicing SELECT FROM


1. Create a query that will return only the Inventory_Description and Price from the
Inventory table.
2. Create a query that will show only the Order_Date and CustomerID from the Sales_
Order table.
3. Create a query that will show the City and State from the Customers table.

ISTUDY
548 Appendix D SQL Part 1

After you get the hang of creating simple SELECT FROM queries, you can begin to bring
in some of the SQL clauses that can make our queries even more interesting. The next two
SQL clauses we will cover are WHERE and ORDER BY. They follow FROM, to make a
query in this order:
SELECT
FROM
WHERE
ORDER BY

One more bit of SELECT information—how to SELECT all of the


attributes:
If you wish to view every attribute in the same order as they exist in the table, you can use
a shortcut to select all:
SELECT *
FROM Inventory
A simple SELECT FROM query with SELECT * isn’t very interesting on its own, but
when we begin filtering records, SELECT * can be a quick way to view how many records
fit a certain criteria. We filter records with the WHERE clause.

Introduction to WHERE
WHERE behaves like a filter in Excel. An example of using WHERE to modify the query
is the following:
SELECT LastName, FirstName, State
FROM Customers
WHERE State = “Arkansas”
That query would return only the customers who were from Arkansas; the result is
shown in Exhibit D-3:

EXHIBIT D-3
Microsoft Access 2016

The syntax of a simple WHERE clause is the following:


WHERE [attribute_name] = [criteria]
The attribute_name needs to be spelled exactly the way it is in the database without any
formatting (for example, do not place the attribute name in quotes).
Formatting criteria in a WHERE clause, or “So, what’s the deal with the quotes around
Arkansas?”
There are three main data types that you will work with in SQL: text, numbers, and
dates. Every attribute in a database is stored as one of those data types. Let’s look at the
Inventory table in the Ch2_SQL_Tutorial database. It has three fields:
• InventoryID (example entries: I-1, I-2)
• Inventory_Description (example entries: Dalton Dress Boot, Ray-Ban Wayfarer)
• Price (example entries: 495, 1250)

ISTUDY
Appendix D SQL Part 1 549

Text Data Types


Both InventoryID and Inventory_Description are text data types. Most text data types are
descriptive or categorical elements in the database. When you filter for criteria from a text
attribute, the criteria must be surrounded in quotes. Examples:
• WHERE State = “Arkansas”
• WHERE Inventory_Description = “Dalton Dress Boot”
A word of caution! Programs like Microsoft Word apply formatting to quotes by turning
them into “curly quotes.” Microsoft Access and other relational database systems cannot
read the curly quotes. If you draft your queries in Word and then copy and paste them into
a SQL editor, the quotes will need to be retyped in the query editor for the database applica-
tion to be able to read the criteria appropriately.

Number Data Types


Price, on the other hand, is a number data type. You could sum or average the contents of
that attribute and arrive at a meaningful value. Another example of number data types are
Quantity_Sold in the Sales_Orders table. When you filter for criteria from a number attri-
bute, there is no need to format the criteria at all. Examples:
• WHERE Price = 395
• WHERE Quantity = 2

Date Data Types


For an example of the third data type, date, look at the Sales_Orders table to find the
Order_Date attribute. When you filter for criteria from a date attribute, the date should be
enclosed in # signs and follow the following format #mm/dd/yyyy#. Examples:
• WHERE Order_Date = #01/02/2019#
• WHERE Order_Date = #12/31/2018#
Date formats in other database management systems: Date formatting in SQL is variable
across relational database management systems.
• In SQLite, date format is ‘yyyy-mm-dd’. For example, ‘2019-01-02’ or ‘2018-12-31’.
• In SQL Server, date format is ‘yyyymmdd’. For example ‘20190102’ or ‘20181231’.
Other methods of filtering, or, do we always have to filter for an exact match?
Each of the WHERE examples we’ve seen so far have used the equals sign operator. But
there are many other ways to filter other than for exact matches. For now, we’ll just start
with a few other operators, shown in Exhibit D-4:
Operator Description EXHIBIT D-4
> used with a number data type Returns all records that have numbers in that field greater than the criteria
specified.
> used with a text data type Returns all records that follow the criteria alphabetically (a–z).
< used with a number data type Returns all records that have numbers in that field less than the criteria
specified.
< used with a text data type Returns all records that precede the criteria alphabetically (a–z).
>= and <= Similar to the above criteria, but will also include numbers or text that is
an exact match to what is listed in the criteria.
<> Functions as the inverse of the exact match (=) filter, it will return all of the
records except those that match the criteria listed in the WHERE clause.

ISTUDY
550 Appendix D SQL Part 1

More SELECT FROM WHERE Examples


To extract all of the records from the Inventory table that have prices greater than $1,000:
SELECT *
FROM Inventory
WHERE Price > 1000
That query returns the following records shown in Exhibit D-5:

EXHIBIT D-5
Microsoft Access 2016

To extract all of the records from the Customers table that follow the last name “jones”
alphabetically:
SELECT *
FROM Customers
WHERE LastName > “Jones”
That query returns the following records shown in Exhibit D-6:

EXHIBIT D-6
Microsoft Access 2016

If you wanted to include any employees with the last name of Jones in the list, you would
change the operator from > to >= :
SELECT *
FROM Customers
WHERE LastName >= “Jones”
The revised output is shown in Exhibit D-7:

EXHIBIT D-7
Microsoft Access 2016

You can see that Jeremy Jones is included in this output.

SELECT FROM WHERE Practice


1. Write a query that will return all of the records from the Sales_Orders table that have a
quantity greater than 4.
2. Write a query that will return all of the records from the Sales_Orders table that have a
quantity less than or equal to 3.
3. Write a query that will return all of the records from the Sales_Orders table that had
exactly one item on it.

ISTUDY
Appendix D SQL Part 1 551

4. Write a query that will return all of the records from the Customers table of the custom-
ers from Texas.
5. Write a query that will return all of the records from the Customers table of the custom-
ers from Baton Rouge

Introduction to ORDER BY
In Exhibit D-7, when you added Jeremy Jones to the output, you might have been surprised
that the order of the records didn’t change. The default order of SQL queries is ascending
based on the first column selected. When you SELECT *, the default will be in the order of
the primary key, which is the order of the records in the original table.
If you would like to sort the records in a query output based on any other column, you
can do so with the ORDER BY clause.
The syntax of an ORDER BY clause is the following:
To sort the records in ascending order (1 to infinity or A to Z): ORDER BY
[attribute_name] ASC

To sort the records in descending order (infinity to 1 or Z to A): ORDER BY


[attribute name] DESC
Similar to WHERE, the attribute_name needs to be spelled exactly the way it is in the
database without any formatting (for example, do not place the attribute name in quotes).
ORDER BY is always the last line in any query, no matter how complex the query is.

Example Queries with ORDER BY


To revise the first query in the appendix from Exhibit D-7, order the output by state,
ascending:
SELECT LastName, FirstName, State
FROM Customers
ORDER BY State ASC
That query returns the following records shown in Exhibit D-8:

EXHIBIT D-8
Microsoft Access 2016

ISTUDY
552 Appendix D SQL Part 1

Notice how the two exhibits have the same information, the same order of attributes,
and the same number of records, but the ordering of the records has changed.
To revise the same query, but this time to order the results by both Last Name and First
Name (ascending):
SELECT LastName, FirstName, State
FROM Customers
ORDER BY LastName ASC, FirstName ASC
That query returns the following records shown in Exhibit D-9:

EXHIBIT D-9
Microsoft Access 2016

EXPANDING THE USAGE OF SELECT


So far, you have learned the SQL keywords SELECT, FROM, WHERE, and ORDER BY.
These words are useful for creating views of a limited amount of data from one table, but
other than limiting the amount of rows and columns returned (with WHERE and SELECT,
respectively) and changing the order in which they are returned, the table data that returns
from these queries is still presented in the same format as it is stored in the database.
We can extend the usage of SELECT to manipulate the data that return from a query
with aggregates.

More Actions You Can Take in SELECT:


Aggregates and Aliases
To aggregate means to form a group or a cluster. In SQL, aggregate data represent grand
totals or subtotals of data. For example, if you did not want to simply view all of the indi-
vidual orders in a Sales_Orders table, but you wanted instead to see a total count of how
many orders were in a table, or you wanted to see the grand total quantity of products ever
sold, you would want to aggregate the data.
The following functions are commonly used in the SELECT clause to aggregate data:
• SUM(attribute)
• COUNT(attribute)
• AVG(attribute)

ISTUDY
Appendix D SQL Part 1 553

The following query uses an aggregate function to create a query that would show the
total count of orders in the Sales_Orders table:
SELECT COUNT(Sales_Order_ID)
FROM Sales_Orders
The output of that query would produce only one column and one row, shown in
Exhibit D-10:

EXHIBIT D-10
Microsoft Access 2016

The problem with this output, of course, is the lack of description. The column is titled
Expr1000, which is not very descriptive. This title is produced because there isn’t a column
named COUNT(Sales_Order_ID), so the database management system doesn’t know what
to title the column in the output.
To make this column more meaningful, we can use aliases. An alias simply renames a
column. It is written as AS. To rename the column COUNT(Sales_Order_ID) to Count_
Total_Orders, the query would look like the following:
SELECT COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
The output is more meaningful with the alias added in, shown in Exhibit D-11:

EXHIBIT D-11
Microsoft Access 2016

To create a query that would show the grand total quantity of products ever sold (as stored
in the Sales_Orders table) with a meaningful column name, we could run the following:
SELECT SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM Sales_Orders
Which returns the following output, shown in Exhibit D-12:

EXHIBIT D-12
Microsoft Access 2016

Aggregates and Aliases Practice


1. Create a query that would show the average price of our inventory items (use the
Inventory table). Rename the column in the output Avg_Price.
2. Create a query that would show the total number of Customers we have stored in the
Customers table. Rename the column in the output Num_Customers.

EXTENDING THE QUERY WITH GROUP


BY AND HAVING CLAUSES
Aggregates are extremely useful to return grand totals of the data that are stored in a data-
base. But sometimes, we would prefer to view those data by subtotals, as well.

ISTUDY
554 Appendix D SQL Part 1

Introduction to GROUP BY
In the introduction to aggregates, we worked through an example that provided the grand
total count of orders in the Sales_Orders table:
SELECT COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
That query results in a grand total of 10, but what if we would like to see how those data
split up among customers who have ordered from us? This is where GROUP BY comes
in. GROUP BY works as the “engine” that powers subtotaling the data. After the keyword
GROUP BY, you indicate the attribute by which you would like to slice the data. In this
case, we want to slice the grand total by CustomerID.
SELECT COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
GROUP BY CustomerID
The problem with this query is that it does slice the data by customer, but it doesn’t actually
show us the CustomerID associated with each subtotal. The output is shown in Exhibit D-13:

EXHIBIT D-13
Microsoft Access 2016

If we want to actually view the CustomerID that is associated with each subtotal, we need
to not only put the attribute in the GROUP BY field, but also add it to the SELECT field.
Remember from earlier in this tutorial that the order in which you place the attributes
in the SELECT clause indicates the order that those columns will display in the output.
For this output, it would make the most sense to see CustomerID before Count_Total,
because CustomerID is acting as a label for the totals. We can modify the query to include
CustomerID in the following way:
SELECT CustomerID, COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
GROUP BY CustomerID
This provides the following output, shown in Exhibit D-14:
EXHIBIT D-14
Microsoft Access 2016

ISTUDY
Appendix D SQL Part 1 555

Similarly, we can extend the second example provided in the Aggregates section that cre-
ated a grand total of the quantity sold from the Sales_Orders table. If we would prefer to not
see the grand total quantity sold, but instead slice that total by InventoryID in order to see
the subtotal of the quantity of each inventory item sold, we can create the following query:
SELECT InventoryID, SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM Sales_Orders
GROUP BY InventoryID
Which produces the following query, shown in Exhibit D-15:

EXHIBIT D-15
Microsoft Access 2016

Notice that InventoryID needs to be added in two places: You must place it in the
GROUP BY clause to provide the “engine” that subtotals a grand total (or slices it), and
then you must also place InventoryID in the SELECT clause so that you can see the labels
associated with each subtotal.

GROUP BY Practice
1. Create a query that would show the total quantity of items sold each day. Rename the
aggregate Total_Quantity_Sold.
2. Create a query that would show the total number of Customers we have stored in the
Customers table, and group them by the State the customers are from. Rename the
aggregate column in the output Num_Customers.

Introduction to HAVING
Occasionally when running a query to gather subtotals (using a GROUP BY clause), you
do not want to see all of the results, but instead would rather filter the results for certain
subtotals. Unfortunately, SQL cannot filter aggregate measures in the WHERE clause, but
fortunately, we have a different clause that can—HAVING.
Anytime you wish to filter your query results based on aggregate values (e.g.,
SUM(Quantity_Sold)), you can do so in the HAVING clause.
For example, in the previous section about GROUP BY, we created a query to see the
total count of orders associated with each customer. The output showed that the vast major-
ity of our customers had participated in only one order. But what if we wanted to only see
the customer(s) who had participated in more than one order?
We can create the following query to add in this filter:
SELECT CustomerID, COUNT(Sales_Order_ID) AS Count_Total_Orders
FROM Sales_Orders
GROUP BY CustomerID
HAVING COUNT(Sales_Order_ID) > 1
As it turns out, there is only one customer who participated in more than one order, as
we can see in the query output, shown in Exhibit D-16:

ISTUDY
556 Appendix D SQL Part 1

EXHIBIT D-16
Microsoft Access 2016

The format of the HAVING clause is similar to WHERE:


HAVING aggregate(attribute) = number

• The aggregate can be any of our aggregate values, SUM(), AVG(), or COUNT().
• The attribute is the field that you are aggregating, SUM(Quantity) or
COUNT(CustomerID).
• The = can be replaced with any operator, =, <, >, =<, =>, <>.
• The number is the value that you are filtering your results on.
Let’s work through another example. The second example in the GROUP BY section
showed the quantity sold of each inventory item. If we want to view only those items that
have sold less than 5 items, we can create the following query:
SELECT InventoryID, SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM Sales_Orders
GROUP BY InventoryId
HAVING SUM(Quantity_Sold) < 5
This query produces the following output, shown in Exhibit D-17:

EXHIBIT D-17
Microsoft Access 2016

HAVING Practice
1. Create a query that would show the total quantity of items sold each day. Rename the
aggregate Total_Quantity_Sold. Show only the days on which more than 6 items were sold.
2. Create a query that would show the total number of Customers we have stored in the
Customers table, and group them by the State the customers are from. Rename the
aggregate column in the output Num_Customers. Show only the states that more than
one customer is from.

EXTENDING THE USE OF THE FROM CLAUSE:


SELECTING DATA FROM MORE THAN ONE TABLE
Some of the real power of SQL extends beyond relatively simple SELECT FROM WHERE
clauses. Since relational databases are focused on reducing redundancy, there are often impor-
tant details that we would like to use for analysis stored across two or three different tables.
For example, in our sample database, we may be interested to know the phone num-
ber of the customer associated with each order. Each order is stored in the Sales_Orders
table, but the details about our customers (including their phone numbers) are stored in the
Customers table. To retrieve data from both tables, we need to first make sure that the tables
are related. We can do that by looking at the database design as shown in Exhibit D-18:

ISTUDY
Appendix D SQL Part 1 557

EXHIBIT D-18
Microsoft Access 2016

The call-out circle and boxes in the exhibit can help us find how these two tables are
related. First, we can see the circle that indicates the relationship connecting the Customers
and Sales_Orders table. This shows us that the two tables are indeed related. The next step
is to identify how they are related. The two red boxes in Exhibit D-18 indicate the related
fields, CustomerID is the primary key in the Customers table, and CustomerID is the for-
eign key in the Sales_Orders table. Since these two tables are related, we can retrieve them
fairly easily with a JOIN clause.
In order to retrieve data from more than one table, we need to use SQL JOINs. There are
three types of JOINs, but for much of our analysis, an INNER JOIN will suffice. JOINs are
technically part of the FROM clause. They follow the following template:
FROM table1
INNER JOIN table2
ON table1.matching_key = table2.matching_key
The order of the tables does not matter; you could place the Customers table in either
the FROM or the INNER JOIN clause, and the order of the tables does not matter in the
ON clause. It just matters that you indicate both tables you want to retrieve data from, and
that you indicate the two different tables with their matching keys in the ON clause.
To select all of the data from the Customers table and the Sales_Orders table, you can
run the following query:
SELECT *
FROM Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
If you want to only select the Sales_Order_ID and the Order_Date from the Sales_
Orders table, but also select the State attribute from the Customers table, you could run the
following query:
SELECT Sales_Order_ID, Order_Date, State
FROM Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID

ISTUDY
558 Appendix D SQL Part 1

INNER JOIN ON Practice


1. Create a query that will show the Customer’s First and Last Names, as well as the
Quantity_Sold and Price of each order the customer was on.
2. Create a query that will show the Order_Date and Quantity_Sold on each order, as well
as the Inventory_Description of the items associated with each order.

Parentheses Are Key to Joining Two or More Tables


Sometimes you want to not only join two tables, but three or more. When you join more
than two tables together, you need to nest the extra joins in parentheses.
In more detail, if you define the number of tables you’re trying to join as “n,” then the
number of parentheses you need after the word FROM is n-2, and you need to have one
right parentheses before the start of each new join clause.
For example, if you are joining three tables, you need a parentheses after the word
FROM and after the first ON clause. Then you can proceed with the query as normal. To
join all three tables in our example database, it would look like the following:
SELECT *
FROM (Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID)
INNER JOIN Inventory
ON Sales_Orders.InventoryID = Inventory.InventoryID
Note: There are other types of joins! Beyond INNER JOINs, we can also create LEFT
and RIGHT JOINs to get slightly different results, depending on our data and our needs.
There is a deep dive to LEFT and RIGHT JOINs in Appendix E.

PUTTING IT ALL TOGETHER


This tutorial has introduced you to the majority of the SQL keywords you will need to
extract data for data analysis or even to answer simple data analysis questions directly in the
database. If you were to use all of the SQL words that we have discovered in this tutorial in
one query, the keywords must go in the following order:
SELECT
FROM
INNER JOIN
ON
WHERE
GROUP BY
HAVING
ORDER BY
A note on syntax: When drafting SQL queries, we typically indent the INNER JOIN
and ON clauses when drafting queries to help remember that those clauses are technically
part of the FROM clause; this helps with remembering the order of all the clauses. Keep in
mind, though, that unlike other programming languages such as R or Python, indentation
does not signify anything in SQL. It is only used for ease in reading and drafting queries,
and some programs (such as Microsoft Access) will not allow you to include the indenta-
tion in the query editor.

ISTUDY
ISTUDY
Confirming Pages

Appendix E
SQL Part 2

In Appendix D, you learned about many key terms in SQL, including how to join tables.
The purpose of joining tables is to enable you to retrieve data that are stored in more than
one table all at once. The join type that you learned about in Appendix D is an INNER
JOIN. There are two other popular join types, though, LEFT and RIGHT.
We will work with the same Access Database that you used in Appendix D. Although it
contains the same data, you can access it through the Appendix E Data.accdb.
We’ll start with bringing these data into Tableau because Tableau has a great way of visu-
alizing joined tables, and specifically, the differences between INNER, LEFT, and RIGHT
JOINs.
1. Open Tableau.
2. Select Microsoft Access to connect to the file and navigate to where you have stored the
file, then click Open.
3. In the Data Source view, drag the Customers table to the Drag tables here section.
4. Double-click on the Customers rectangle to enter the Physical layer.
5. Double-click the Sales_Orders table to create a join between Sales_Orders and Customers.

EXPLANATION OF INNER JOINS


Notice the Venn diagram that appears connecting the two tables:

Tableau Software, Inc. All rights reserved.

Click the Venn diagram to see the following details about how the tables are related:

Tableau Software, Inc. All rights reserved.

560

ISTUDY ric44907_appE_560-563.indd 560 02/24/23 07:28 AM


Appendix E SQL Part 2 561

Tableau has defaulted to joining these two tables with an INNER JOIN, and it has accu-
rately identified the two keys that are related between the two tables, CustomerID in the
Customers table, and CustomerID in the Sales_Orders table.
This is very similar to how you would write a query to gather the same information
directly in the Access database, where one of the tables is indicated in the FROM clause,
the second table is indicated in the INNER JOIN clause, and the keys that are common
between the two tables are indicated with an equal sign between them in the ON clause:
SELECT *
FROM Customers
INNER JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
As the Venn diagram suggests, an INNER JOIN will show all of the data for which there
is a match between the two tables. However, it is important to notice what that means it
leaves out—it will not return any of the data for which there is not a match between the two
tables.
In this instance, there is actually one customer held in the Customers table that is not
included in the Sales_Orders table (Customer 3, Edna Orgeron). Why would this happen?
Perhaps this fictional company records data on potential customers, so even though some-
one may have not actually purchased anything yet, the company can still contact them.
Whatever the reason might be—the fact that CustomerID 3 is in the Customers table, but
has no reference in the Sales_Orders table means that using an INNER JOIN will not
include CustomerID 3 in the results.
If the above SQL query were to be run, the following result would return:

SQL

Notice that the red box surrounding the records for Customers 2 and 4 do not include
anything for Customer 3.

EXPLANATION OF LEFT JOINS


If we wanted to see all of the data from the Customers table, even if there isn’t a match in
the Sales_Orders table, then we need to change our join type.
Return to Tableau and click the Venn Diagram again to change the join type to a LEFT
JOIN.

ISTUDY
562 Appendix E SQL Part 2

Tableau Software, Inc. All rights reserved.

The red box indicates an important change that occurs as soon as we made the change
to a LEFT JOIN—Customer 3 is included! But not only that, while we see Customer 3’s
name and contact information, we see null values for any attributes from the Sales_Orders
table. That is because there isn’t any corresponding information for Customer 3 in the
Sales_Orders table.
To replicate this query in Access, the only change that needs to be made is swap the
word INNER with LEFT:
SELECT *
FROM Customers
LEFT JOIN Sales_Orders
ON Customers.CustomerID = Sales_Orders.CustomerID
It is easier to visualize how joins are created in Tableau, but they work the same way in
SQL. The table that you place in the FROM clause is the “left” table, and the table that you
place in the JOIN clause is the “right table.”

EXPLANATION OF RIGHT JOINS


Looking back to the Venn diagram, we can see that a RIGHT JOIN would return the oppo-
site result of a LEFT JOIN. In this specific instance, if there were any Sales_Orders that
had a CustomerID on them that were not associated with a CustomerID in the Customers
table, the only way we would see them is if we created a RIGHT JOIN.
We dive more deeply into this concept in the text when we discuss audit analytics, but
perhaps you can already imagine how this type of join would be useful for detecting errors
or fraud—we definitely would want to isolate any sales orders that had customer information
on them that didn’t align with our verified customer listing!

ISTUDY
ISTUDY
Appendix F
Power Query in Excel and Power BI

Excel’s Get and Transform tools are a part of the Power BI suite that is integrated into
Excel 2016 and also the standalone Power BI application. These tools allow you to connect
directly to a dataset stored in a variety of locations, including Excel files; .csv files; the web;
and a multitude of relational databases, including Microsoft Access, SQL Server, Teradata,
Oracle, PostGreSQL, and MySQL.
Throughout this text, the majority of the times we analyze the Dillard’s dataset in the
Comprehensive Labs, we will load the data from SQL Server into Power BI and transform
it using Power Query, or we will load the data into Excel using the Get and Transform tool
or into Power BI.
When we extract the data, we may want to extract entire tables, or we may want to
extract only a portion via a SQL query.
In this appendix, we will connect to the Dillard’s data. The Dillard’s data are stored on
the University of Arkansas’ remote desktop, so make sure to log in to the desktop in order
to work through these steps. Ask your instructor for login information if you do not have it
already.

CONNECT TO SQL SERVER


Power BI: Connect to SQL Server
1. Open Power BI in the University of Arkansas remote desktop. Click Get Data and
select Import data from SQL Server (Database > SQL Server database).

564

ISTUDY
Appendix F Power Query in Excel and Power BI 565

Excel: Connect to SQL Server through Excel’s Get and


Transform Tool
1. Open Excel in the University of Arkansas remote desktop. From the Data tab on the
ribbon, click Get Data. Then select From Database > From SQL Server Database.

Microsoft Excel 2016

2. The following box will pop up, into which you should provide the name of the Server
and the Database name that your instructor provides you. For the comprehensive exer-
cises, we use the Database name WCOB_DILLARDS.

SQL

ISTUDY
566 Appendix F Power Query in Excel and Power BI

Once you have input the Server and Database name, you have two options:
3. Extract entire tables by clicking OK. Continue to Step 5.
4. Extract only a portion of data from one or more tables based on the criteria of a SQL
query. To do so, click Advanced Options. Skip to Step 7.
In either instance, after clicking OK, you will be prompted for two messages:
• When prompted to input your credentials, select Use my Current Credentials and click
Connect.
• When prompted with an Encryption Support window, click OK.

Power BI & Excel: To Extract Entire Tables


5. Click OK.

SQL

6. Select the table(s) that you would like to load into Excel. If you would like to select
more than one table, place a check mark in the box next to Select multiple items and
select STORE and TRANSACT.

SQL

ISTUDY
Appendix F Power Query in Excel and Power BI 567

Power BI & Excel: To Extract a Portion of the Data


7. Click Advanced Options to input your SQL query. Input your query in the space pro-
vided and click OK.

SQL

POWER BI & EXCEL: EDITING THE DATA IN


POWER QUERY
Regardless of whether you extracted entire tables or extracted data based on a query, you
can either Load the data directly into Excelor Power BI, or you can Edit the data in Power
Query first.

ISTUDY
568 Appendix F Power Query in Excel and Power BI

Microsoft Excel 2016

• Clicking Load will load the data directly into an Excel table or Power BI datasheet.
• Clicking Edit or Transform Data will open the Power Query window for you to trans-
form the data before they are loaded into Excel (add or delete columns, remove or
transform null values, aggregate data, etc.).

8. To Edit (Transform) the data, click Edit or Transform Data.


9. The Power Query ribbon has several tabs that provide useful ways to transform the
data. A few of the buttons that we use throughout the text are called out for the Home
tab and the Transform tab on the ribbon below.
Home tab on the ribbon:

Microsoft Excel 2016

• Click the Close & Load or Close & Apply button when you are finished transforming
the data to load them into Excel.
• The Remove Rows button provides options to remove rows with nulls in selected col-
umns, with duplicates in selected columns, or based on other criteria
Transform tab on the ribbon:

Source: Microsoft Excel 2016

ISTUDY
Appendix F Power Query in Excel and Power BI 569

• Replace Values functions the same way in Power Query as it does in Excel, except the
transformation is stored and thus repeatable when created in Power Query.
• Pivot Column creates two new columns out of an existing category column (for exam-
ple, you can pivot the Transaction_Type column by the transaction amount).
• The Date button will allow you to transform an existing date column into a date part
(year, month, day, etc.) or change the date format. It is also useful to create duplicates
of existing date columns, then transform the copies into the date parts.

AFTER LOADING THE DATA INTO POWER BI OR


EXCEL: HOW TO CONTINUE WORKING WITH
YOUR DATA
Once you are finished transforming your data in the Power Query Editor and you click the
Close & Load button (Excel) or Close & Apply (Power BI), your data will begin to load into
a worksheet.

Excel: What to Do If the Load to the Worksheet Fails


This process may take several minutes depending on how large of a data file you are loading.
Sometimes, the query load will fail—this will often occur due to an attempt to load more than
1,048,576 records. If the load fails, you can hover over the error message to view what caused
the error. In this case, the query result was too large, so we can select Load to Data Model.

Microsoft Excel 2016

Loading the data to the data model will allow us to work with a large dataset in a
PivotTable, even though the dataset itself is too large for the worksheet.

ISTUDY
570 Appendix F Power Query in Excel and Power BI

Excel: How to Return to the Power Query Window


after Closing It
If you wish to further transform your data using the Power Query window, you can do so by
double-clicking on the Query label in the Workbook Queries window (if the Workbook Queries
pane is not showing, you can click the Data tab on the ribbon, then click Show Queries).

Microsoft Excel 2016

Power BI: How to Return to the Power Query


Window after Closing It
If you wish to further transform your data using the Power Query window, you can do so by
clicking Transform Data from the Home tab in Power BI.

Microsoft Excel 2016

ISTUDY
ISTUDY
Appendix G
Power BI Desktop

Power BI Desktop is a Microsoft tool that combines ETL tools with reporting tools. When
we work with Power Query or PowerPivot in Excel, we’re actually working with Power
BI tools. If you will ultimately want to run statistical tests such as hypothesis testing or
regression analysis, it’s best to work within Excel directly and use the Power Query add-in.
However, if you need to transform your data using Power Query prior to creating reports or
dashboards or even if you just want to explore your data, Power BI Desktop can be a great
alternative to other reporting and visualization tools such as Tableau.
When it comes to creating visualizations in Power BI, you can create extremely similar
results to what you can create in Tableau, but the path to getting there is different. Power
BI defaults to a report mode (similar to Tableau’s Dashboard mode), so that as you create
visuals, they appear as tiles that you can resize and rearrange around the canvas.
When you open Power BI Desktop, you will be greeted with a startup screen similar to
the following:

Power BI Desktop

The tutorials and other training resources on the right of the startup screen are helpful
for getting started with the tool.
The Get Data button on the left of the startup screen will bring you into Power BI’s
Power Query tool. It is set up exactly like the Power Query tool is set up in Excel, so you can
use it to connect to a variety of sources (Excel, SQL Server, Access, etc.).

572

ISTUDY
Appendix G Power BI Desktop 573

To familiarize yourself with Power BI, we will use the Appendix_G_Data.xlsx. It is a modi-
fied version of the Sláinte file that you might work with in Lab 2-2, Lab 4-1, Lab 4-2, Lab 6-3,
or Lab 7-2. The data are a subset of the sales data related to a fictional brewery named Sláinte.
1. Click Get Data on the startup screen.
2. Select Excel from the list of possible data sources, then click Connect.

Power BI Desktop

3. Browse to the file location for Power BI_Appendix.xlsx and Open the file.
4. Because there are three spreadsheets in the file, the Navigator provides you the option
to select 1, 2, or all of the spreadsheets. Place check marks in each.
5. You are also given an option to either Load or Edit the data. If you click Edit, you will
enter the Power Query window with the same ribbon and options to transform the data
as you are familiar with from the Excel version of the tool (add columns, split columns,
pivot data, etc.). These data do not need to be transformed, so we will click Load.
6. Once the data are loaded, you will see a blank canvas on which you can build a report.
There are three key elements bordering the canvas:
a. To the left of the blank canvas, you are presented with three options:

Microsoft Power BI Desktop

ISTUDY
574 Appendix G Power BI Desktop

Report Mode: The first option, represented with an icon that looks like a bar chart,
is for Report mode. This is the default view and is where you can build your visual-
izations and explore your data.
Data Mode: The second option, represented with an icon that looks like a table or a
spreadsheet, is for Data mode. If you click into this icon, you can view the raw data
that you have imported into Power BI. You can also create new measures or new col-
umns from this mode.
Model Mode: The third option, which looks like a database diagram, is for Model
mode. If you click into this icon, you enter PowerPivot. From this mode, you can edit
the table and attribute names or edit relationships between tables.
b. To the right of the blank canvas is your Fields list and your options for Visualizations.

Power BI Desktop

Visualizations: You can drag any of these options over into the canvas to begin
designing a visualization. Once you have tiles on your report, you can change the
type of visualization being used to depict a set of fields by clicking the tile, then
selecting any of the visualization options to change the way the data are presented.
Fields: This section is similar to your PivotTable field list. You can expand the tables
to see the attributes that are within each and placing a check mark in the fields will
add them to an active tile.

ISTUDY
Appendix G Power BI Desktop 575

Values, Filters, etc.: This section will vary based on the tile and the fields you are actively
working with. Anytime you add a field to a visualization, that field gets automatically added
to the filters, which cuts out the need to manually add filters or slicers to your PivotTable.
c. Immediately above the canvas is the familiar ribbon that you can expect from
Microsoft applications. The four tabs—Home, View, Modeling, and Help—stay
consistent across the three different modes (report, data, and model), but the
options that you can select will vary based on the mode in which you are working.
7. To begin working with the data, Expand the Customer table to place a check mark in
the State field.

Power BI Desktop

8. Power BI will default to creating a tile with a map visualization. This is similar to how
Tableau defaults to working with geographic data. To make the map more interesting,
expand the Sales_Orders table to place a check mark in the Quantity Sold field.

Power BI Desktop

ISTUDY
576 Appendix G Power BI Desktop

This will make the tile more interesting by changing the size of the symbol associated
with each state—the larger the symbol, the higher the quantity sold in that state.
9. You can also change the way the data are presented by selecting a different visualiza-
tion type. Select the first option to view the data in a horizontal bar chart.

Power BI Desktop

10. One of the most exciting offerings from Power BI is its natural language processing for
data exploration. From the Insert tab in the ribbon, click Buttons. In the drop-down,
select Q&A.

Power BI Desktop

11. The following icon will appear as a separate tile. If the placement defaults to being on
top of the bar chart, you can click and drag it to somewhere else on the canvas:

Power BI Desktop

ISTUDY
Appendix G Power BI Desktop 577

12. To activate the Q&A, ctrl + click the icon. The following window will pop up and you
can select from the list of questions that Power BI has come up with, or you can type
directly into the “Ask a question about your data” box.

Power BI Desktop

13. You can also add a question directly to the canvas by selecting Q&A from the
Visualizations pane.

There are many other exciting benefits that Power BI can do, but with this introduction
you should have the confidence to jump in and explore more that Power BI has to offer.

ISTUDY
Appendix H
Tableau Prep Builder

Before jumping into the labs, you may wish to introduce yourself to Tableau Prep through
this appendix if you have never used the Tableau Prep tool. Tableau Prep Builder is a tool
used to extract, transform, and load data from various sources interactively. When you
create a flow in Tableau Prep that cleans, joins, pivots, and exports data for use in Tableau
Desktop, the flow can be reused with multiple datasets with the same structure.
To access Tableau Prep, you can use the University of Arkansas’ remote desktop to
access the Walton Lab (see your instructor for instructions on how to access it), or you can
download a free academic usage license of Tableau by following this URL: https://www.
tableau.com/academic/students. Tableau Prep will work on a PC or a Mac. The images in
this textbook will reflect Tableau Prep for PC, but it is very similar to Tableau Prep for Mac.
Tableau Prep can connect to a variety of data types, including Excel, Access, and SQL
Server. We will connect to the Superstore sample flow.
1. Open Tableau Prep Builder.
2. Immediately upon opening Tableau, you will see a list of flows to open or you can connect
to data. Open the Superstore sample flow.

Tableau Software, Inc. All rights reserved.

578

ISTUDY
Appendix H Tableau Prep Builder 579

3. Here you will see a completed flow used to transform and combine various data files.
Click through each of the primary steps (items 4 through 7 below) to see transforma-
tion options:

Tableau Software, Inc. All rights reserved.

4. Click the Orders_East icon to show the connect to data step. This makes the connection
to the raw data. Here you can rename fields and uncheck attributes to exclude them
from your transformation.

Tableau Software, Inc. All rights reserved.

ISTUDY
580 Appendix H Tableau Prep Builder

5. Click the Fix Data Type step to clean and transform the data. This allows you to make
transformations to fix errors and combine values. It also shows a preview of the data
and frequency distribution of values for each field.

Tableau Software, Inc. All rights reserved.

6. Click the All Orders step to show the union step. This is where you will combine mul-
tiple data files into one. In this case, the union will add each of the preceding tables one
after the other into a larger table.

Tableau Software, Inc. All rights reserved.

ISTUDY
Appendix H Tableau Prep Builder 581

7. In the flow, click the Create ‘Superstore Sales.hyper’ step where you export the data.
This step is where you save the cleaned data as a Tableau hyper file or Excel file for use
in another program, such as Tableau Desktop. It also shows you a preview of your trans-
formed data.

Tableau Software, Inc. All rights reserved.

8. Finally, to add any additional steps to the flow, click the + icon to the right of any box.
This gives you the option to add a new task in the flow. Click through each of the steps
to see options available.
• Clean Step to fix errors,
• Aggregate to calculated summary statistics,
• Pivot to summarize data on multiple attributes like a PivotTable in Excel,
• Join step to combine two tables on a matching primary key and foreign key pair,
• Union to combine similar tables into one larger table,
• Script to do some advanced transformation using a scripting language, or
• Output to export your data at any point in the process.

Tableau Software, Inc. All rights reserved.

ISTUDY
Appendix I
Tableau Desktop

Before jumping into the labs, you may wish to introduce yourself to Tableau Desktop
through this appendix if you have never used the Tableau Desktop tool. Tableau Desktop is
an interactive data visualization and modeling tool that allows you to connect to data and
create powerful dashboards.
To access Tableau Desktop, you can use the University of Arkansas’ remote desktop
(see your instructor for instructions on how to access it), or you can download a free aca-
demic usage license of Tableau by following this URL: https://www.tableau.com/academic/
students. Tableau will work on a PC or a Mac. The images in this textbook will reflect
Tableau for PC, but it is very similar to Tableau for Mac.
Tableau Desktop can connect to a variety of data types, including Excel, Access, and
SQL Server. We will connect to the dataset Appendix_I_Data.xlsx. If you worked through
Appendix B about PivotTables, this is the same dataset that you worked with previously.
1. Open Tableau.
2. Immediately upon opening Tableau Desktop, you will see a list of file types that you can
connect to. We’ll connect to an Excel file, so click Microsoft Excel.

Tableau Software, Inc. All rights reserved.

582

ISTUDY
Appendix I Tableau Desktop 583

3. Navigate to where your file is stored and click Open.


Tableau automatically detects the data types of the attributes you import. In this data-
set, the attributes probably all imported as the data type you would expect. Notice that
the first two, Invoice # and Customer #, imported as number. Continue looking at the
attributes, and you will notice the globe icon above Zip Code. This is Tableau showing
you one of its best features; it shows that the Zip Code data were imported as
geographic data. This will allow you to create maps.

Tableau Software, Inc. All rights reserved.

4. To begin working with the data, click Sheet 1 in the bottom left.

Tableau Software, Inc. All rights reserved.

Here is a quick introduction to pieces of the Tableau canvas:

Tableau Software, Inc. All rights reserved.

ISTUDY
584 Appendix I Tableau Desktop

5. To begin creating some basic visualizations with the data, double-click on the measure
Gross Margin.

Tableau Software, Inc. All rights reserved.

Immediately you will see how Tableau interacts with data differently than Excel because
it has defaulted to displaying a bar chart. This isn’t a very meaningful chart as it is, but
you can add meaning by adding a dimension.

ISTUDY
Appendix I Tableau Desktop 585

6. Double-click Description from the dimensions.


Similar to the analysis found in Appendix B regarding PivotTables, you find the same
numbers: Apples with a gross margin of 140.4, apricots with a gross margin of 78.0, and
so on.
7. To make these data easier to interpret, you can sort them. Click the Sort Descending
icon to sort the data.

Tableau Software, Inc. All rights reserved.

You can continue adding attributes (sometimes referred to as pills) to the columns or
rows shelves and changing the method of visualization using the Show Me tab to fur-
ther familiarize yourself with the tool.

ISTUDY
Appendix J
Data Dictionaries

This appendix describes the datasets and tables used in this textbook. Complete data
dictionaries with tables, attributes, and descriptions can be found in the textbook resources
in Connect.

Dillard’s Department Store


This dataset contains sales data for Dillard’s Department Stores in the following tables. It can
only be accessed through the University of Arkansas Walton Lab via remote desktop
connection.
• TRANSACT: Sales transactions, including date, store, customer number, transaction
amount, and SKU of items.
• SKU: List of inventory items including UPC code, style, color, brand, and department
• DEPARTMENT: Departments assigned to inventory including hierarchy. Department
Century (DEPTCENT_DESC) is the highest-level category. Each Department Century
contains multiple Department Decades (DEPDEC_DESC). Each Department Decade
contains multiple Departments (DEPT_DESC).
• STORE: List of Dillard’s store locations, including division and address. Also included
Dillard’s Online Store (#698).
• SKU_STORE: Joining table that shows which products (SKU) are assigned to which
stores (STORE).
• CUSTOMER: Includes customer ID, address, and distance to nearest store.

Sláinte
Sláinte data contain author-generated data that reflect business transactions for a fictional
brewery. Different datasets look at purchases, sales, and other general transactions. See the
Sláinte data dictionary on Connect for full details.

LendingClub
This dataset contains demographic data tied to approved and rejected loans on the Lending-
Club peer-to-peer lending website. These attributes are anonymized to hide individual
borrowers, but contain information such as employment history, outstanding accounts, and
debt-to-income information.

College Scorecard
This dataset is provided by the U.S. Department of Education and contains demographic
information about the composition of student body and completion rates of different
colleges in the United States.

586

ISTUDY
Appendix J Data Dictionaries 587

Oklahoma Purchase Card


These data contain transaction details for employees of the State of Oklahoma, including
cardholder name, transaction amounts, dates, and descriptions.

S&P100
The SP100 Facts dataset contains values attached to XBRL financial statement data pulled
from the SEC EDGAR data repository. The single table contains the company name, ticker,
standardized XBRL tag, value, and year. This dataset mimics the data available in the live
Google Sheet XBRLAnalyst add-in.
The SP100 Sentiment dataset contains word counts from financial statements pulled
from the SEC EDGAR data repository. This single table includes company information
and filing dates, as well as total word and character counts. Additionally, it includes word
counts for categories of terms that match those in the Harvard and Loughran-McDonald
sentiment dictionaries, such as positive, negative, modal, litigious, and uncertain.

Avalara
This dataset provides tax information for U.S. states including different region, state, county,
and city tax rates. It is used for tax planning purposes.

ISTUDY
Glossary
A common size financial statement (407) A type of financial
statement that contains only basic accounts that are common
accounting information system (54) A system that records, across companies.
processes, reports, and communicates the results of
composite primary key (58) A special case of a primary
business transactions to provide financial and nonfinancial
key that exists in linking tables. The composite primary key
information for decision-making purposes.
is made up of the two primary keys in the table that it is
alternative hypothesis (131) The opposite of the null linking.
hypothesis, or a potential result that the analyst may expect.
computer-assisted audit techniques (CAATs)
audit data standards (ADS) (249) The audit data standards (286) Automated scripts that can be used to validate data,
define common tables and fields that are needed by auditors test controls, and enable substantive testing of transaction
to perform common audit tasks. The AICPA developed these details or account balances and generate supporting evidence
standards. for the audit.
continuous auditing (253) A process that provides real-time
B assurance over business processes and systems.

Balanced Scorecard (342) A particular type of digital dash- continuous data (188) One way to categorize quantitative
board that is made up of strategic objectives, as well as KPIs, data, as opposed to discrete data. Continuous data can take
target measures, and initiatives, to help the organization on any value within a range. An example of continuous data
reach its target measures in line with strategic goals. is height.

Benford’s law (128, 292) The principle that in any large, continuous monitoring (253) A process that constantly
randomly produced set of natural numbers, there is an evaluates internal controls and transactions and is the chief
expected distribution of the first, or leading, digit with 1 responsibility of management.
being the most common, 2 the next most, and down succes- continuous reporting (253) A process that provides real-
sively to the number 9. time access to the system status and accounting information.
Big Data (4) Datasets that are too large and complex customer relationship management (CRM) system (54) An
for businesses’ existing systems to handle utilizing their information system for managing all interactions between the
traditional capabilities to capture, store, manage, and analyze company and its current and potential customers.
these datasets.
D
C
Data Analytics (4) The process of evaluating data with
causal modeling (133) A data approach similar to regres- the purpose of drawing conclusions to address business
sion, but used to test for cause-and-effect relationships questions. Indeed, effective Data Analytics provides a way
between multiple variables. to search through large structured and unstructured data to
classification (11, 133) A data approach that attempts identify unknown patterns or relationships.
to assign each unit in a population into a few categories data dictionary (19, 59) Centralized repository of
potentially to help with predictions. descriptions for all of the data attributes of the dataset.
clustering (11, 128) A data approach that attempts to data mart (459) A subset of the data warehouse focused
divide individuals (like customers) into groups (or clusters) on a specific function or department to assist and support its
in a useful or meaningful way. needed data requirements.
co-occurrence grouping (11, 129) A data approach that data reduction (12, 120) A data approach that attempts
attempts to discover associations between individuals based to reduce the amount of information that needs to be
on transactions involving them. considered to focus on the most critical items (e.g., highest
common data model (249) A tool used to map existing cost, highest risk, largest impact, etc.).
database tables and fields from various systems to a data request form (62) A method for obtaining data if you
standardized set of tables and fields for use with analytics. do not have access to obtain the data directly yourself.

588

ISTUDY
Glossary 589

data warehouse (248, 459) A repository of data accumulated Enterprise Resource Planning (ERP) (54) Also known as
from internal and external data sources, including financial Enterprise Systems, a category of business management
data, to help management decision making. software that integrates applications from throughout the
decision boundaries (138) Technique used to mark the split business (such as manufacturing, accounting, finance, human
between one class and another. resources, etc.) into one system.
decision support system (141) An information system ETL (60) The extract, transform, and load process that is
that supports decision-making activity within a business by integral to mastering the data.
combining data and expertise to solve problems and perform exploratory visualizations (189) Made when the lines between
calculations. steps “P” (perform test plan), “A” (address and refine results),
decision tree (138) Tool used to divide data into smaller and “C” (communicate results) are not as clearly divided as
groups. they are in a declarative visualization project. Often when you
are exploring the data with visualizations, you are performing
declarative visualizations (188) Made when the aim of the test plan directly in visualization software such as Tableau
your project is to “declare” or present your findings to an instead of creating the chart after the analysis has been done.
audience. Charts that are declarative are typically made
after the data analysis has been completed and are meant to
exhibit what was found in the analysis steps. F
descriptive analytics (116, 286) Procedures that summarize financial statement analysis (406) Used by investors,
existing data to determine what has happened in the past. analysts, auditors, and other interested stakeholders to
Some examples include summary statistics (e.g., Count, Min, review and evaluate a company’s financial statements and
Max, Average, Median), distributions, and proportions. financial performance.
descriptive attributes (58) Attributes that exist in relational flat file (57, 248) A means of storing data in one place,
databases that are neither primary nor foreign keys. These such as in an Excel spreadsheet, as opposed to storing the
attributes provide business information, but are not required data in multiple tables, such as in a relational database.
to build a database. An example would be “Company Name”
foreign key (58) An attribute that exists in relational
or “Employee Address.”
databases in order to carry out the relationship between two
diagnostic analytics (116, 286) Procedures that explore the tables. This does not serve as the “unique identifier” for each
current data to determine why something has happened the record in a table. These must be identified when mastering
way it has, typically comparing the data to a benchmark. As the data from a relational database in order to extract the
an example, these allow users to drill down in the data and data correctly from more than one table.
see how they compare to a budget, a competitor, or trend.
fuzzy matching (287) Process that finds matches
digital dashboard (125, 341) An interactive report showing that may be less than 100 percent matching by finding
the most important metrics to help users understand how correspondences between portions of the text or other entries.
a company or an organization is performing. Often created
using Excel or Tableau.
discrete data (188) One way to categorize quantitative data,
H
as opposed to continuous data. Discrete data are represented heat map (414) A visualization that shows the relative size
by whole numbers. An example of discrete data is points in a of values by applying a color scale to the data.
basketball game. heterogeneous systems approach (248) Heterogeneous
dummy variable (135) A numerical value (0 or 1) to represent systems represent multiple installations or instances
categorical data in statistical analysis; values assigned a 1 indi- of a system. It would be considered the opposite of a
cate the presence of something and 0 represents the absence. homogeneous system.
DuPont ratio (409) Developed by the DuPont Corporation to homogeneous systems approach (248) Homogeneous
decompose performance (particularly return on equity [ROE] systems represent one single installation or instance
into its component parts. of a system. It would be considered the opposite of a
heterogeneous system.
horizontal analysis (411) An analysis that shows the change
E of a value from one period to the next.
effect size (141) Used in addition to statistical significance human resource management (HRM) system (54) An
in statistical testing; effect size demonstrates the magnitude information system for managing all interactions between the
of the difference between groups. company and its current and potential employees.

ISTUDY
590 Glossary

I null hypothesis (131) An assumption that the hypothesized


relationship does not exist, or that there is no significant
index (411) A metric that shows how much any given difference between two samples or populations.
subsequent year has changed relative to the base year.
interquartile range (IQR) (124) A measure of variability.
To calculate the IQR, the data are first divided into four O
parts (quartiles) and the middle two quartiles that surround ordinal data (187) The second most sophisticated type of
the median are the IQR. data on the scale of nominal, ordinal, interval, and ratio;
interval data (187) The third most sophisticated type of a type of qualitative data. Ordinal can be counted and
data on the scale of nominal, ordinal, interval, and ratio; a categorized like nominal data and the categories can also be
type of quantitative data. Interval data can be counted and ranked. Examples of ordinal data include gold, silver, and
grouped like qualitative data, and the differences between bronze medals.
each data point are meaningful. However, interval data do overfitting (140) A modeling error when the derived model
not have a meaningful 0. In interval data, 0 does not mean too closely fits a limited set of data points.
“the absence of” but is simply another number. An example
of interval data is the Fahrenheit scale of temperature
measurement. P
performance metric (339) Any calculation measuring
K how an organization is performing, particularly when that
key performance indicator (KPI) (339) A particular type measure is compared to a baseline.
of performance metric that an organization deems the most predictive analytics (116, 286) Procedures used to generate
important and influential on decision making. a model that can be used to determine what is likely to
happen in the future. Examples include regression analysis,
forecasting, classification, and other predictive modeling.
L
predictor (or independent or explanatory) variable (11) A
link prediction (12, 133) A data approach that attempts to variable that predicts or explains another variable, typically
predict a relationship between two data items. called a predictor or independent variable.
prescriptive analytics (117, 287) Procedures that work
M to identify the best possible options given constraints or
changing conditions. These typically include developing
mastering the data (54) The second step in the IMPACT more advanced machine learning and artificial intelligence
cycle; it involves identifying and obtaining the data needed models to recommend a course of action, or optimizing,
for solving the data analysis problem, as well as cleaning and based on constraints and/or changing conditions.
preparing the data for analysis. primary key (57) An attribute that is required to exist in
each table of a relational database and serves as the “unique
N identifier” for each record in a table.
process mining (286) Analysis technique of business
nominal data (187) The least sophisticated type of data
processes used to diagnose problems and suggest
on the scale of nominal, ordinal, interval, and ratio; a type
improvements where greater efficiency may be applied.
of qualitative data. The only thing you can do with
nominal data is count, group, and take a proportion. production or live systems (248) Production (or live)
Examples of nominal data are hair color, gender, and systems are those active systems that collect and report and
ethnic groups. are directly affected by current transactions.
normal distribution (188) A type of distribution in which profiling (11, 123) A data approach that attempts to
the median, mean, and mode are all equal, so half of all characterize the “typical” behavior of an individual, group, or
the observations fall below the mean and the other half fall population by generating summary statistics about the data
above the mean. This phenomenon is naturally occurring in (including mean, standard deviations, etc.).
many datasets in our world, such as SAT scores and heights proportion (187) The primary statistic used with
and weights of newborn babies. When datasets follow a quantitative data. Proportion is calculated by counting the
normal distribution, they can be standardized and compared number of items in a particular category, then dividing that
for easier analysis. number by the total number of observations.

ISTUDY
Glossary 591

Q normal distribution has 0 for its mean (and thus, for its mode
and median, as well), and 1 for its standard deviation.
qualitative data (186) Categorical data. All you can do with
standardization (188) The method used for comparing
these data is count and group, and in some cases, you can
two datasets that follow the normal distribution. By using a
rank the data. Qualitative data can be further defined in two
formula, every normal distribution can be transformed into
ways: nominal data and ordinal data. There are not as many
the standard normal distribution. If you standardize both
options for charting qualitative data because they are not as
datasets, you can place both distributions on the same chart
sophisticated as quantitative data.
and more swiftly come to your insights.
quantitative data (187) More complex than qualitative
standardized metrics (419) Metrics used by data vendors to
data. Quantitative data can be further defined in two ways:
allow easier comparison of company-reported XBRL data.
interval and ratio. In all quantitative data, the intervals
between data points are meaningful, allowing the data to structured data (4, 123) Data that are organized and reside
be not just counted, grouped, and ranked, but also to have in a fixed field with a record or a file. Such data are generally
more complex operations performed on them such as mean, contained in a relational database or spreadsheet and are
median, and standard deviation. readily searchable by search algorithms.
summary statistics (119) Describe the location, spread,
shape, and dependence of a set of observations. These
R commonly include the count, sum, minimum, maximum,
ratio analysis (407) A tool that attempts to evaluate mean or average, standard deviation, median, quartiles,
relationships among different financial statement items correlation covariance, and frequency that describe a specific
to help understand a company’s financial and operating measurable value.
performance. sunburst diagram (414) A visualization that shows inherent
ratio data (187) The most sophisticated type of data on hierarchy.
the scale of nominal, ordinal, interval, and ratio; a type of supervised approach/method (133) Approach used to learn
quantitative data. They can be counted and grouped just like more about the basic relationships between independent and
qualitative data, and the differences between each data point dependent variables that are hypothesized to exist.
are meaningful like with interval data. Additionally, ratio supply chain management (SCM) system (54) An
data have a meaningful 0. In other words, once a dataset information system that helps manage all the company’s
approaches 0, 0 means “the absence of.” An example of ratio interactions with suppliers.
data is currency.
support vector machine (139) A discriminating classifier
regression (11, 133) A data approach that attempts to that is defined by a separating hyperplane that works first to
estimate or predict, for each unit, the numerical value of find the widest margin (or biggest pipe).
some variable using some type of statistical model.
systems translator software (248) Systems translator
relational database (56) A means of storing data in order software maps the various tables and fields from varied ERP
to ensure that the data are complete, not redundant, and to systems into a consistent format.
help enforce business rules. Relational databases also aid in
communication and integration of business processes across
an organization. T
response (or dependent) variable (10) A variable that t-test (290) A statistical test used to determine if there is a
responds to, or is dependent on, another. significant difference between the means of two groups, or
two datasets.
Tax Cuts and Jobs Act of 2017 (460) Tax legislation
S offering a major change to the existing tax code.
similarity matching (11, 133) A data approach that tax data mart (459) A subset of a company-owned data
attempts to identify similar individuals based on data known warehouse focused on the specific needs of the tax department.
about them. tax planning (464) Predictive analysis of potential tax
sparkline (413) A small visual trendline or bar chart that liability and the formulation of a plan to reduce the amount
efficiently summarizes numbers or statistics in a single of taxes paid.
spreadsheet cell. test data (138) A set of data used to assess the degree
standard normal distribution (188) A special case of the and strength of a predicted relationship established by the
normal distribution used for standardizing data. The standard analysis of training data.

ISTUDY
592 Glossary

time series analysis (137) A predictive analytics technique W


used to predict future values based on past values of the
same variable. what-if scenario analysis (464) Evaluation of the impact
training data (138) Existing data that have been manually of different tax scenarios/alternatives on various outcome
evaluated and assigned a class, which assists in classifying measures including the amount of taxable income or tax paid.
the test data.

X
U
XBRL (eXtensible Business Reporting Language) (122, 417)
underfitting (140) A modeling error when the derived
A global standard for exchanging financial reporting
model poorly fits a limited set of data points.
information that uses XML.
unstructured data (4) Data that do not adhere to a pre-
defined data model in a tabular format. XBRL taxonomy (417) Defines and describes each key
data element (like cash or accounts payable). The taxonomy
unsupervised approach/method (129) Approach used for
also defines the relationships between each element (like
data exploration looking for potential patterns of interest.
inventory is a component of current assets and current assets
is a component of total assets).
V
XBRL-GL (420) Stands for XBRL–General Ledger; relates
vertical analysis (407) An analysis that shows the to the ability of an enterprise system to apply XBRL tags
proportional value of accounts to a primary account, such to financial elements within the firm’s financial reporting
as Revenue. system.

ISTUDY
Index
A Sales Tax Owed—Dillard’s, Attributes, descriptive, 58, 70
Abacus, 14 475–479 Audience, effective communication and,
Accountants Lab 9-3: Comprehensive Case: 203–204
role in data management, 460 Calculate Total Sales Tax Audit data analytics, 282–333
skills and tools needed, 13–17, 337 Paid—Dillard’s, 479–486 address and refine results, 288
Accounting Lab 9-4: Comprehensive Case: Benford’s law. See Benford’s law
affect of data analytics on, 5–8 Estimate Sales Tax Owed communicate insights, 288
analytic models for, 116–119 by Zip Code—Dillard’s and descriptive analytics, 286–287,
auditing and, 6. See also Auditing Avalara, 486–492 288–290. See also Descriptive
data reduction and, 120–122 Lab 9-5: Comprehensive Case: analytics
decision support systems, 117, 118, Online Sales Taxes Analysis— diagnostic analytics, 286–287,
141–142, 144, 296 Dillard’s and Avalara, 290–294. See also Diagnostic
profiling and, 126–127 492–497 analytics
regression and, 134–137 LendingClub example, 23–24 examples, 287
skills needed, 13–17, 337 management accounting, 338–339, identify the questions, 284
Accounting data, using and storing, 56 347 Lab 6-1: Evaluate Trends and
Accounting information system, 54, Advanced Environmental Recycling Outliers—Oklahoma, 304–311
56, 70 Technologies (AERT), 126–127 Lab 6-2: Diagnostic Analytics Using
Accounts receivable Advanced machine learning, 117 Benford’s Law—Oklahoma,
cash and, 500–506 Age analysis, descriptive analytics and, 311–317
Question 1 Part 2: How efficiently 287, 289 Lab 6-3: Finding Duplicate
Is the Company Collecting Aggregates/aliases, expand SELECT Payments—Sláinte, 317–321
Cash?, 503–504 SQL clauses, 552–553 Lab 6-4: Comprehensive Case:
Acid test ratio, 408 Ahmed, A. S., 136n4 Sampling—Dillard’s, 321–325
ACL software, 288 AICPA, 62, 249–251, 264 Lab 6-5: Comprehensive Case:
Activity ratios, 408–409 Airbnb, 412 Outlier Detection—Dillard’s,
Address and refine results Alarms, continuous monitoring, 325–332
audit data analytics and, 288 254–255 master the data, 284–286
IMPACT cycle, 13, 23–24, 338–339, Alibaba, 422 nature/extent/timing of, 284
347 Alphabet, 422 perform test plan, 286–288
Lab 2-2: Prepare Data for Analysis— Alternative hypothesis, 131, 144 predictive/prescriptive, 286–287,
Sláinte, 79–83 Alternative stacked bar chart, 196 294–296
Lab 2-7: Comprehensive Case: Altman, Edward, 295n2 track outcomes, 288
Preview Data from Tables— Amazon, 11, 12, 57, 118, 143 when to use, 284–288
Dillard’s, 103–108 Amazon RDS, 56 Audit data standards (ADS), 62,
Lab 2-8: Comprehensive Case: American Institute of Certified Public 249–251, 252, 256, 264
Preview a Subset of Data in Accountants (AICPA), 62, Audit plan, 251–253
Excel, Tableau Using a SQL 249–251, 264 Auditing
Query—Dillard’s, 108–112 Analytics mindset, 14 alarms and exceptions, 254–255
Lab 7-2: Create a Balanced Anscombe, Francis, 183 automation and, 246–247
Scorecard Dashboard— Anscombe’s Quartet, 183–184 clustering approach to, 130–131
Sláinte, 367–376 Apple Inc., 39–40, 410, 411 continuous monitoring techniques,
Lab 8-1: Create a Horizontal and Applied statistics, predictive analytics, 128, 253–254, 257
Vertical Analysis Using 287, 296 data analytics and, 6
XBRL Data—S&P100, Artificial intelligence, 117, 118, 142–143, data reduction and, 120–121
430–437 286, 287, 296 documentation of, 255–256
Lab 9-2: Comprehensive Case: Asset turnover ratio, 409 external, 120–121
Calculate Estimated State Assurance services, 246–247 internal, 120–121, 127–128, 247–248

593

ISTUDY
594 Index

Auditing—Cont. objectives, 342–343, 346 Chartered Financial Analyst (CFA),


Lab 1-3: Data Analytics in Auditing, Bar charts, 192–194, 196–198 407
42–44 Barbaro, M., 128n2 Charting data
Lab 5-4: Identify Audit Data Base, 249 bar charts, 192–194, 196–198
Requirements, 263–267 Bay Area Rapid Transit (BART), 115 data scale and increments, 201
profiling examples, 127–128 Benchmarking how to create good, 195–199
regression approach, 136–137 diagnostic analytics and, 122 line charts, 194, 524–525
remote, 255–256 financial statements and, 406, 410 qualitative data and, 192–194
standards, 62, 249–251, 252, 256, 264 Beneish, M. D., 295n3 qualitative versus quantitative,
tax compliance and, 461–463 Benford’s law 186–188
working papers, 255–256, 272–274 defined, 144, 287, 297 quantitative data and, 194–195
(lab) Lab 6-2: Diagnostic Analytics Using refining charts, 200–202
Auditors Benford’s Law, 311–317 types of charts, 186, 189, 195
Question Set 1: Order-to-Process predicting distribution, 128, use of color, 201–202, 414
(O2P), 500–506 292–293 See also Data visualization
Question Set 2: Procure-to-Pay Berinato, Scott, 186, 186n4, 189 Chen, H., 4n4
(P2P), 506–511 Big data, 3, 4, 26, 286 Chiang, R. H. L., 4n4
Aura, PwC tool, 255–256, 257 Bjerrekaer, Julius Daugbejerg, 53 Chief audit executive (CAE), 247, 255
Automation Black-box audit, 255 Chik-fil-A, 528
CAATs, 286–288, 297 Blockchain, 3 Chipotle, 203
continuous monitoring and, Bloomberg, 126 Citigroup, 254
253–254, 257 Boeing, 420 Class, 133
of data analytics, 246–247, 251–253 Boundaries, support vector machines, Classification
data standards and, 249–251 139–140, 145 accounting example, 117
enterprise data, 248–251 Box, cloud computing, 272 defined, 11, 26, 138, 144
impact of, 251 Box plots, 124–125, 195, 290 evaluating, 140
Avalara Buffett, Warren, 412 goals of, 137
data dictionary, 587 Bullet graph, 341 leasing and, 142–143
Lab 9-4: Comprehensive Case: Business, data analytics impact on, 4–5 overfitting, 140, 145
Estimate Sales Tax Owed by Business intelligence (BI), 15 predictive analytics and, 118, 133,
Zip Code, 486–492 Business processes 137–140, 287, 295
Lab 9-5: Comprehensive Case: evaluating, 500 steps, 138
Online Sales Taxes Analysis, order-to-cash process, 500–506 terminology, 138–139
492–497 procure-to-pay process, 506–511 Cleaning of data, 53, 65–67
Average collection period ratio, 409 Cloud folder (lab), 272–274
Average days in inventory ratio, 409 Clustering
C auditing and, 130–131
CAATs, 286–288, 297 defined, 11, 26, 144
B Calcbench, 419–420 diagnostic analytics, 117, 118,
Background information, select Cash, accounts receivable and, 500–506 128–131, 287, 294
S&P100 companies, 438 (lab) Cash tags, XBRL and, 417–419 high-volume stores, 399–400 (lab)
Balance sheet composition, 414, 421 Categorical data, 186, 205. See also Lab 3-2: Diagnostic Analytics:
Balanced scorecard Qualitative data Identify Data Clusters—
components of, 342–343 Causal modeling, 133, 144 LendingClub, 157–160
creation of, 342 Central tendency, describing the sample Co-occurrence grouping
defined, 348 by, 528–529 accounting example, 117
digital dashboards, 346 Certified Management Accountant cluster analysis, 129. See also
example, 335, 343 (CMA), 407 Clustering
key performance indicators and, Certified Public Accountant (CPA), defined, 11, 26, 118, 144
341–345. See also Key 407 Cohn, M., 3
performance indicators (KPIs) Change amount, 412 College Scorecard
Lab 7-2: Create a Balanced Change in value relative to base year, data dictionary, 586
Scorecard Dashboard— 412 Lab 2-5: Validate and Transform
Sláinte, 367–376 Change percent, 412 Data, 95–98

ISTUDY
Index 595

College Scorecard—Cont. Lab 2-6: Build Relationships among Continuous auditing, 128, 253–254, 257
Lab 3-3: Perform a Linear Database Tables—Dillard’s, Continuous data, 188, 205
Regression Analysis, 160–166 98–102 Continuous monitoring/reporting,
Color, use in charts, 201–202, 414 Lab 2-7: Preview Data from Tables— 253–254, 257
Column chart, 198 Dillard’s, 103–108 Control Objectives for Information and
Columns, tables and, 57–58 Lab 2-8: Preview a Subset of Data in Related Technologies (COBIT),
Committee of Sponsoring Excel, Tableau Using a SQL 252
Organizations (COSO), 252 Query—Dillard’s, 108–112 Corptax, 457
Common data model Lab 3-4: Descriptive Analytics: Cost accounting
AICPA and, 249–251 Generate Summary Lab 7-1: Evaluate Job Costs,
defined, 249, 257 Statistics—Dillard’s, 166–169 355–367
Lab 5-1: Create a Common Data Lab 3-5: Diagnostic Analytics: regression approach, 136
Model—Oklahoma, 263–267 Compare Distributions— Cost behavior, 340–341
Lab 5-2: Create a Dashboard Based Dillard’s, 169–174 Cost optimization, 337
on a Common Data Lab 3-6: Create a Data Abstract Coughlin, Tom, 127
Model—Oklahoma, 267–271 and Perform Regression Credit risk score, 20, 22
Common size financial statements, Analysis—Dillard’s, 174–179 Current ratio, 408
407–408, 423, 437–441 (lab) Lab 4-4: Visualize Declarative Data— Customer KPIs, 344. See also Key
Communicating results, 180–243 Dillard’s, 229–236 performance indicators (KPIs)
audience and tone, 203–204 Lab 4-5: Visualize Exploratory Customer relationship management
audit data analytics and, 288 Data—Dillard’s, 236–242 (CRM), 54, 70
charting data. See Charting data Lab 5-5: Setting Scope—Dillard’s, Cybersecurity, 68
content and organization, 202–203 277–281
data visualization. See Data Lab 6-4: Sampling—Dillard’s,
visualization 321–325 D
determining method, 186 Lab 6-5: Outlier Detection—Dillard’s, Daily Mail, 195
introduction to, 13, 24 325–332 Damodaran, Aswath, 412–413
Lab 4-1: Visualize Declarative Lab 7-3: Analyze Time Series Data— Dashboards
Data—Sláinte, 212–218 Dillard’s, 377–389 digital, 125, 144, 341–342, 346, 348
Lab 4-2: Perform Exploratory Lab 7-4: Comparing Results to Lab 4-1: Visualize Declarative
Analysis and Create a Prior Period—Dillard’s, Data—Sláinte, 212–218
Dashboards—Sláinte, 218–222 389–398 Lab 4-2: Perform Exploratory
Lab 4-3: Create Dashboards— Lab 7-5: Advanced Performance Analysis and Create
LendingClub, 223–229 Models—Dillard’s, 398–403 Dashboards—Sláinte, 218–222
Lab 4-4: Comprehensive Case: Lab 9-2: Calculate Estimated State Lab 4-3: Create Dashboards—
Visualize Declarative Sales Tax Owed—Dillard’s, LendingClub, 223–229
Data—Dillard’s, 229–236 475–479 Lab 4-4: Comprehensive Case:
Lab 4-5: Comprehensive Case: Lab 9-3: Calculate Total Sales Tax Visualize Declarative
Visualize Exploratory Paid—Dillard’s, 479–486 Data—Dillard’s, 229–236
Data—Dillard’s, 236–242 Lab 9-4: Estimate Sales Tax Owed Lab 4-5: Comprehensive Case:
management accounting, 339 by Zip Code—Dillard’s and Visualize Exploratory
revision and, 204 Avalara, 486–492 Data—Dillard’s, 236–242
statistics versus visualizations, Lab 9-5: Online Sales Taxes Lab 5-2: Create a Dashboard Based
183–184 Analysis—Dillard’s and on a Common Data
use of words, 202–204 Avalara, 492–497 Model—Oklahoma, 267–271
Complexity of model, versus Computer-assisted audit techniques Lab 7-2: Create a Balanced
classification, 140 (CAATs), 286–288, 297 Scorecard Dashboard,
Compliance, tax, 461–463 Conceptual charts, 195 367–376
Composite primary key, 58, 70 Conceptual data, 187 sales tax data, 462
Comprehensive Case Confidence interval, 531 to track KPIs, 458
Lab 1-4: Questions about Dillard’s Confidence level, 289–290 Data
Store Data, 44–47 Connect, PwC tool, 255 big, 3, 4
Lab 1-5: Connect to Dillard’s Store Content, of written communication, breach of, 53, 68–69
Data, 47–51 202–203 categorical, 186, 205

ISTUDY
596 Index

Data—Cont. Sláinte, 586 Lab 4-5: Comprehensive Case:


cleaning of, 53, 65–67 S&P100, 587 Visualize Exploratory
ethics. See Ethics Data-driven charts, 195 Data—Dillard’s, 236–242
extracting, transforming, loading. Data management, taxes and, 458–461 Lab 5-2: Create a Dashboard Based
See Extracting, transforming, Data marts, 459, 460, 467 on a Common Data
loading (ETL) Data models Model—Oklahoma, 267–271
gathering/review, 10, 17–19 common models. See Common Lab 9-1: Descriptive Analytics:
internal/external sources of, 54–55 models State Sales Tax Rates,
loading of, 67–68 Lab 5-1: Create a Common 472–475
manipulation of, 14 Data Model—Oklahoma, normal distribution, 188, 205,
mapping of, 250–251 263–267 529–530
quality of, 66–67, 419–420 Lab 5-2: Create a Dashboard Based preference over text, 184–185
relationships, relational databases on a Common Data Model, purpose of, 185–192
and, 56–59 267–271 qualitative versus quantitative data,
security of, 53 Data preparation, 14 186–188
storage of, 56–57 Data privacy, 53, 68 Question 3.1: By Looking at Line
structured. See Structured data Data profiling. See Profiling Charts for 2014 and 2015,
unstructured, 4, 26 Data quality Does the Average Percentage
validation of, 53, 64–65, 82–83 errors and, 67 of Sales Returned in 2014
variability/spread, describing, 529 XBRL and, 419–420 Seem to Be Predictive of
Data Analysis Toolpak, Excel add-in, Data reduction Returns in 2015?, 524–525
544–545 accounting example, 120–122 Question Set 1: Descriptive and
Data analytics approach to data analytics, 12–13 Exploratory Analysis,
affect on accounting, 5–8 auditing example, 120–121 514–519
affect on business, 4–5 defined, 26, 144 relative size of accounts, 414
auditing and, 6 Lab 3-1: Descriptive Analytics: Filter showing trends, 413
automation of, 246–247, 251–253 and Reduce Data—Sláinte, sparklines, 413–414, 423
defined, 4, 26 153–157 versus statistics, 183–184
descriptive. See Descriptive analytics steps, 120 sunburst diagrams, 414, 423
diagnostic. See Diagnostic analytics used for, 117–118 tax analytics and, 461–463
IMPACT cycle and. See IMPACT Data request, 61–62, 77–78 (lab) tools for, 189–191
cycle Data request form, 62, 70 use of written communication,
Lab 1-1: Data Analytics Questions in Data scale, 201 202–204
Financial Accounting, 39–40 Data scrubbing, 10, 14, 53 See also Communicating results
Lab 1-2: Data Analytics Questions Data validation, 53, 64–65, 82–83 Data warehouse, 248, 257, 459, 467
in Managerial Accounting, Data visualization Database maps, 255
41–42 Anscombe’s Quartet, 183–184 Database schema, 56, 58, 59
Lab 2-1: Request Data from categorical data, 186, 205 Databases
IT—Sláinte, 77–78 charting. See Charting data data dictionaries. See Data
predictive. See Predictive analytics declarative, 188–191, 205 dictionaries
prescriptive. See Prescriptive exploratory, 188–191, 205 ETL process, 54
analytics heat maps and, 181–182, 194, 414, normalized relational, 56–57
skills needed, 13–17 423 relational. See Relational databases
tool selection, 14–17, 36–38 Lab 4-1: Visualize Declarative storage of data, 56–57
types of, 116–119 Data—Sláinte, 212–218 table attributes, 57–58
Data cleaning, 65–67 Lab 4-2: Perform Exploratory Datasets
Data types, SQL WHERE clauses, 549 Analysis and Create big data and, 3, 4
Data dictionaries Dashboards—Sláinte, breach of, 53, 68–69
Avalara, 587 218–222 Dates, data quality and, 66
College Scorecard, 586 Lab 4-3: Create Dashboards— Dayan, Zohar, 184n1
defined, 26, 70 LendingClub, 223–229 de la Merced, Michael J., 254
Dillard’s, 586 Lab 4-4: Comprehensive Case: Debreceny, Roger S., 4n3
LendingClub, 19, 59–60, 586 Visualize Declarative Debt-to-equity ratio, 409, 410
Oklahoma Purchase Card, 587 Data—Dillard’s, 229–236 Debt-to-income ratio, 20–21

ISTUDY
Index 597

Decision boundaries, 138, 144 Lab 3-2: Diagnostic Analytics: Excel, Tableau Using a SQL
Decision support systems, 117, 118, Identify Data Clusters— Query, 108–112
141–142, 144, 296 LendingClub, 157–160 Lab 3-4: Comprehensive Case:
Decision trees, 138–139, 144 Lab 3-5: Comprehensive Case: Descriptive Analytics:
Declarative visualizations, 188–191, Diagnostic Analytics: Generate Summary Statistics,
205, 212–218 (lab), 229–236 Compare Distributions— 166–169
(lab) Dillard’s, 169–174 Lab 3-5: Comprehensive Case:
Deductions, what-if analysis and, management accounting, 337–338 Diagnostic Analytics:
465–466 methods used, 122 Compare Distributions,
Delivery process, 504–505 profiling, 117, 118, 123–128 169–174
Deloitte, 291 Question 2.1: Is the Percentage of Lab 4-4: Visualize Declarative Data,
Denormalize data, 79 Sales Returned Significantly 229–236
Dependent variables, 10–11, 26 Higher in January after the Lab 4-5: Visualize Exploratory Data,
Descriptive analytics Holiday Season?, 519–521 236–242
age analysis, 287, 289 Question 2.2: How Do the Percentages Lab 5-5: Comprehensive Case:
audit data analytics and, 286–287, of Returned Sales for Holiday/ Setting Scope, 277–281
288–290 Non-Holiday Differ for Online Lab 6-4: Comprehensive Case:
data reduction. See Data reduction Transactions and across Sampling, 321–325
defined, 144, 297 Different States?, 521–523 Lab 6-5: Comprehensive Case:
examples of, 116, 117–118 Question 2.3: What Else Can Outlier Detection, 325–332
financial statements, 406, 407 You Determine about the Lab 7-3: Comprehensive Case:
Lab 3-1: Descriptive Analytics: Filter Percentage of Returned Sales Analyze Time Series Data,
and Reduce Data—Sláinte, through Diagnostic Analysis?, 377–389
153–157 523–524 Lab 7-4: Comprehensive Case:
Lab 3-4: Comprehensive Case: sequence check, 287, 294 Comparing Results to a Prior
Descriptive Analytics: stratification/clustering, 287, 294 Period, 389–398
Generate Summary t-tests, 287, 290–291 Lab 7-5: Comprehensive Case:
Statistics—Dillard’s, 166–169 tax analytics and, 456, 457 Advanced Performance
management accounting and, Z-score, 123–124, 128, 287, 290, 291 Models, 398–403
337–338 Dictionaries, data. See Data Lab 9-2: Comprehensive Case:
Question Set 1: Descriptive and dictionaries Calculate Estimated State
Exploratory Analysis, Digital dashboards Sales Tax Owed, 475–479
514–519 balanced scorecard, 346 Lab 9-3: Comprehensive Case:
sampling, 287, 289–290 defined, 144, 348 Calculate Total Sales Tax
sorting, 287, 289 key performance indicators and, Paid, 479–486
summary statistics. See Summary 341–342 Lab 9-4: Comprehensive Case:
statistics profiling and, 125 Estimate Sales Tax Owed by
tax analytics and, 456, 457 Dillard’s Stores Inc. Zip Code, 486–492
Descriptive attributes, 58, 70 data dictionary, 586 Lab 9-5: Comprehensive Case:
Diagnostic analytics estimating sales returns, 514–527 Online Sales Taxes Analysis,
audit data analytics and, 286–287, Lab 1-4: Comprehensive Case: 492–497
290–294 Questions about Dillard’s Power Query tutorial, 564–570
benefits, 122 Store Data, 44–47 Discrete data, 188, 205
Benford’s Law, 128, 287, 292–293 Lab 1-5: Comprehensive Case: Discrimination Function, 461
box plots and quartiles, 124–125, Connect to Dillard’s Store Distribution
195, 290 Data, 47–51 predicting, Benford’s law, 128,
cluster analysis, 117, 118, 128–131, Lab 2-6: Comprehensive Case: 292–293
287, 294 Build Relationships among probability, 529–530
defined, 116, 118, 144, 297 Database Tables, 98–102 Distributions, 116
drill-down, 287, 293 Lab 2-7: Comprehensive Case: Documentation, of auditing, 255–256
exact and fuzzy matching, 287, Preview Data from Tables, Drill-down, diagnostic analytics, 287,
293–294 103–108 293
financial statements, 406, 410 Lab 2-8: Comprehensive Case: Dropbox, 272
hypothesis testing, 122, 131–133 Preview a Subset of Data in Dummy variables, 135, 144

ISTUDY
598 Index

DuPont Corporation, 409 Average Percentage of Sales cash tags, 417–419


DuPont ratio, 409, 410, 422, 423 Returned in 2014 Seem to data quality and, 419–420
Be Predictive of Returns in defined, 145, 417
2015?, 524–525 instance document, 417–419
E Q 3.2: Using Regression, Can We Lab 8-1: Create a Horizontal and
EDGAR database, 40 Predict Future Returns as a Vertical Analysis Using XBRL
Effect size, 141, 144 Percentage of Sales Based Data—S&P100, 430–437
Effective tax rate (ETR), 462–463 on Historical Transactions?, Lab 8-2: Create Dynamic Common
Eisenberg, Harry, 185n2 526–527 Size Financial Statements—
Electronic working papers, 255–256, Q 3.3: What Else Can You Determine S&P100, 437–441
272–274 about the Percentage of Lab 8-3: Analyze Financial
ELT, 67–68, 203, 345 Returned Sales through Statement Ratios—S&P100,
Employee performance KPIs, 344 Predictive Analysis?, 527 441–444
Encoding, data quality and, 67 Ethics, of data collection, 53, 68–69 standardized metrics, 419–420, 421,
English dictionary (H4N-inf), 416 ETL (extracting, transforming, loading) 423
Enterprise data, 248–251 automation and, 251 taxonomy, 417–418, 423
Enterprise Resource Planning (ERP), defined, 70 uses of, 40, 122, 417
14, 54, 70 extracting steps, 61–64 XBRL-Global Ledger, 420–422, 423
Enterprise Systems (ESs), 54, 248 Lab 2-1: Request Data from IT— eXtensible markup language (XML),
Entity-relationship diagram (ERD), 98, Sláinte, 77–78 417
103, 108 Lab 2-2: Prepare Data for Analysis— External auditing, 120–121
Environmental and social sustainability Sláinte, 79–83 External data sources, 54–55
KPIs, 345 Lab 2-5: Validate and Transform Extracting, loading, transforming
Equifax, 27 Data—College Scorecard, (ELT), 67–68, 203, 345
Equity multiplier, 409 95–98 Extracting, transforming, loading (ETL)
ER diagram, 98, 103, 108 Lab 4-2: Perform Exploratory automation and, 251
ERP systems, 14, 54, 70 Analysis and Create defined, 70
Errors, data quality and, 67 Dashboards—Sláinte, 218–222 extracting steps, 61–64
Estimated misstatements, 290 Lab 5-1: Create a Common Data Lab 2-1: Request Data from IT—
Estimating sales returns Model—Oklahoma, 263–267 Sláinte, 77–78
Q 1.1: Compare the Percentage loading of data, 67–68 Lab 2-2: Prepare Data for Analysis—
of Returned Sales across steps in process, 54, 60–68 Sláinte, 79–83
Months, States, and Online transforming data, 64–67 Lab 2-5: Validate and Transform
versus In-Person See also Master the data Data—College Scorecard,
Transactions, 514–518 European Union, 461 95–98
Q 1.2: What Else Can You Exact matching, diagnostic analytics, Lab 4-2: Perform Exploratory
Determine about the 287, 293–294 Analysis and Create
Percentage of Returned Excel. See Microsoft Excel Dashboards—Sláinte, 218–222
Sales through Descriptive Exception report, 254 Lab 5-1: Create a Common Data
Analysis?, 518–519 Exceptions, 125 Model—Oklahoma, 263–267
Q 2.1: Is the Percentage of Sales Experian, 27 loading of data, 67–68
Returned Significantly Higher Explanatory variables, 11, 26 steps in process, 54, 60–68
in January after the Holiday Exploratory analysis transforming data, 64–67
Season?, 519–521 Lab 4-2: Perform Exploratory See also Master the data
Q 2.2: How Do the Percentages of Analysis and Create
Returned Sales for Holiday/ Dashboards—Sláinte, 218–222
Non-Holiday Differ for Online Lab 4-5: Comprehensive Case: F
Transactions and across Visualize Exploratory Data— F-test, 135
Different States?, 521–523 Dillard’s, 236–242 Facebook, 7–8, 12, 185, 410, 461
Q 2.3: What Else Can You Determine Question Set 1: Descriptive and False negative alarm, 254–255
about the Percentages of Exploratory Analysis, 514–519 False positive alarm, 254–255
Returned Sales through Exploratory visualization, 188–191, 205. FASB, 40, 136, 142, 417
Diagnostic Analysis?, 523–524 See also Data visualization Favorable variance, 340
Q 3.1: By Looking at Line Charts eXtensible Business Reporting Fawcett, Tom, 11n16
for 2014 and 2015, Does the Language (XBRL) Feinn, R., 141n6

ISTUDY
Index 599

Filled geographic maps, 195 Fujitsu, 419 Homogeneous systems approach,


Filtering, 117 Full Outer Join, SQL clause, 293–294 248–249, 257
Financial accounting, questions in Fuzzy matching Horizontal financial statement analysis,
(lab), 39–40 data reduction and, 120–121 407–408, 411, 423, 430–437
Financial Accounting Standards Board defined, 297 (lab)
(FASB), 40, 136, 142, 417 diagnostic analytics, 287, 293–294 Howson, C. J., 15
Financial dictionary (Fin-Neg), 416 Lab 3-1: Descriptive Analytics: Filter Human error, data quality and, 67
Financial KPIs, 344 and Reduce Data—Sláinte, Human resource management (HRM)
Financial ratio analysis, 407, 408–409 153–157 system, 54, 70
Financial reporting, data analytics and, Hyperion, 457
7–8 Hypothesis testing
Financial statement analysis, 404–453 G diagnostic analytics and, 122,
benchmarks, 406, 410 GAAP Financial Reporting Taxonomy, 131–133
common size statements, 407–408, 417 for differences in groups, 131–133, 144
423, 437–441 (lab) Gap detection, 121 Lab 3-5: Diagnostic Analytics:
defined, 423 Gartner Magic Quadrant for Business Compare Distributions—
descriptive, 406, 407 Intelligence and Analytics Dillard’s, 169–174
diagnostic, 406, 410 Platforms, 15, 189 Question 2.1: Is the Percentage of
index showing change, 411–412 Gathering/review of data, 10, 17–19 Sales Returned Significantly
Lab 8-1: Create a Horizontal and General ledger, 249 Higher in January after the
Vertical Analysis Using XBRL General Motors, 422 Holiday Season?, 519–521
Data—S&P100, 430–437 Generalized audit software (GAS), Question 2.2: How Do the
Lab 8-2: Create Dynamic Common 288 Percentages of Returned Sales
Size Financial Statements— Generalized data warehouse, 459 for Holiday/Non-Holiday
S&P100, 437–441 Geographic maps, 195 Differ for Online Transactions
Lab 8-3: Analyze Financial Get and Transform tool, Microsoft and across Different States?,
Statement Ratios—S&P100, Excel, 565–566 521–523
441–444 Good classification, 137 Question 2.3: What Else Can
Lab 8-4: Analyze Financial Google, 8, 143 You Determine about the
Sentiment—S&P100, Google Drive, 272 Percentage of Returned Sales
444–453 Google sheets through Diagnostic Analysis?,
predictive, 406, 410–412 Lab 8-1: Create a Horizontal and 523–524
prescriptive, 406, 412–413 Vertical Analysis Using XBRL statistics and, 530–531
questions, 7–8 Data—S&P100, 430–437
text mining/sentiment analysis, Lab 8-2: Create Dynamic Common
415–417 Size Financial Statements— I
three company ratio comparison, S&P100, 437–441 IBM DB2, 56
410 Graham, Benjamin, 412 IDEA software, 63, 142, 288
using ratios, 407, 408–409, 441–444 Gray, Glen L., 4n3 Identify the questions
valuing equities, 412–413 GROUP BY, SQL clauses, 554–555 audit data analytics and, 284
vertical and horizontal analysis, Lab 1-1: Data Analytics Questions in
407–408, 411, 423 Financial Accounting, 39–40
visualizing data. See Data H Lab 1-2: Data Analytics Questions
visualization Halo, PcW tool, 245, 255 in Managerial Accounting,
XBRL. See eXtensible Business Harriott, J. S., 9, 54n3, 246 41–42
Reporting Language (XBRL) Harvard Business Review, 186 Lab 1-3: Data Analytics Questions in
Financing ratios, 409 HAVING, SQL clauses, 555–556 Auditing, 42–44
FinDynamics, 431 (lab) Heat maps, data visualization and, Lab 1-4: Questions about Dillard’s
Fixed asset subledger, 250 181–182, 194, 414, 423 Store Data (Comprehensive
Flat file, 56, 57–58, 70, 248, 257 Hernandez, Robert, 337 Case), 44–47
Flood of alarms, 254 Heterogeneous systems approach, Lab 2-1: Request Data from IT—
Forbes Insights/KPMG, 6 248–249, 257 Sláinte, 77–78
Forecasting, 116 Hewlett-Packard Co., 283 Lab 2-3: Resolve Common Data
Foreign keys, 58, 70 Hirsch, Lauren, 254 Problems—LendingClub,
FROM, SQL clause, 546–547 Histogram, 472–475 (lab) 84–91

ISTUDY
600 Index

Identify the questions—Cont. management accounting and, K


Lab 2-5: Validate and Transform 336–339, 347 Kaplan, Robert S., 342
Data—College Scorecard, master the data, 10, 17–19, 337, Karaian, Jason, 254
95–98 345–346. See also Master the Kennedy, Joseph, 5n7
Lab 2-7: Comprehensive Case: data Kenya Red Cross, 335
Preview Data from Tables— model, 246 Key performance indicators (KPIs)
Dillard’s, 103–108 question identification, 9–10, 17, balanced scorecard, 341–345
Lab 4-1: Visualize Declarative 284, 336 defined, 348
Data—Sláinte, 212–218 tax analytics and, 456–458 environmental and social
Lab 4-2: Perform Exploratory test plan, 10–13, 20–22, 116–119, sustainability, 345
Analysis and Create 337–338, 345–346 evaluating using dashboards, 185
Dashboards—Sláinte, tracking outcomes, 13, 24, 288, 339 financial performance, 344
218–222 use of, 9–13 important indicators, 344–345
Lab 6-3: Finding Duplicate Income tax liability, 462 Lab 7-2: Create a Balanced
Payments—Sláinte, 317–321 Increments, charting and, 201 Scorecard Dashboard—
Lab 7-2: Create a Balanced Independent variables, 11, 26 Sláinte, 367–376
Scorecard Dashboard— Index, 411–412, 423 Lab 7-3: Comprehensive Case:
Sláinte, 367–376 Information overload, 254 Analyze Time Series Data—
Lab 7-3: Comprehensive Case: Information Systems Audit and Control Dillard’s, 377–389
Analyze Time Series Association (ISACA), 252 Lab 7-4: Comprehensive Case:
Data—Dillard’s, 377–389 Initiatives, 342, 346 Comparing Results to a Prior
Lab 7-4: Comprehensive Case: INNER JOIN, SQL clause, 293, Period, 389–398
Comparing Results to a Prior 557–558, 560–561 role of, 339
Period—Dillard’s, 389–398 Input ticker symbols, 442 (lab) tax analytics and, 457, 462–463
Lab 8-1: Create a Horizontal and Instagram, 8, 27, 461 variance analysis, 339–340
Vertical Analysis Using Instance documents, 417–419 Kirkegaard, Emil, 53
XBRL Data—S&P100, Institute of Business Ethics, 68 KPI. See Key performance indicators
430–437 Institute of Management Accountants, (KPIs)
Lab 8-3: Analyze Financial 3 Krolik, A., 53n2
Statement Ratios—S&P100, Internal auditing
441–444 data reduction and, 120–121
Lab 9-2: Comprehensive Case: increasing importance of, 247–248 L
Calculate Estimated State profiling example, 127–128 Labs
Sales Tax Owed—Dillard’s, Internal data sources, 54–55 1-0: How to Complete Labs, 36–39
475–479 Internal Revenue Service (IRS), 6, 456, 1-1: Data Analytics Questions in
Lab 9-3: Comprehensive Case: 461 Financial Accounting, 39–40
Calculate Total Sales Tax International characters, data quality 1-2: Data Analytics Questions in
Paid—Dillard’s, 479–486 and, 67 Managerial Accounting,
Lab 9-4: Comprehensive Case: International Standards for the 41–42
Estimate Sales Tax Owed Professional Practice of Internal 1-3: Data Analytics Questions in
by Zip Code—Dillard’s and Auditing, 252 Auditing, 42–44
Avalara, 486–492 Interquartile range (IQR), 124–125, 144 1-4: Comprehensive Case: Questions
Lab 9-5: Comprehensive Case: Interval data, 187, 205 about Dillard’s Store Data,
Online Sales Taxes Analysis— Inventory subledger, 250 44–47
Dillard’s and Avalara, Inventory turnover, 246–247, 409 1-5: Comprehensive Case: Connect
492–497 Invoices, paying, Question Set 2: to Dillard’s Store Data, 47–51
managerial accounting and, 336 Procure-to-Pay, 506–511 2-1: Request Data from IT—Sláinte,
Idione, T. W., 15 IRS, 6, 456, 461 77–78
IMPACT cycle Isson, J. P., 9, 54n3, 246 2-2: Prepare Data for Analysis—
address and refine results, 13, 23–24, Sláinte, 79–83
338–339, 347 2-3: Resolve Common Data
audit data analytics, 284–288 J Problems—LendingClub,
communicating results, 13, 24 James, LeBron, 455 84–91
KPIs for decision making, 339, 342 JD Edwards, 248 2-4: Generate Summary Statistics—
LendingClub example, 17–24 JOIN, SQL clauses, 557–558 LendingClub, 91–95

ISTUDY
Index 601

Labs —Cont. 5-4: Identify Audit Data Languages


2-5: Validate and Transform Data— Requirements—Sláinte, data quality and, 67
College Scorecard, 95–98 275–277 SQL. See SQL
2-6: Comprehensive Case: Build 5-5: Comprehensive Case: Setting Lean accounting, 116
Relationships among Database Scope—Dillard’s, 277–281 Lease classification, 142–143
Tables—Dillard’s, 98–102 6-1: Evaluate Trends and Outliers— Lebied, M., 9n14
2-7: Comprehensive Case: Preview Oklahoma, 304–311 LEFT JOIN, SQL clause, 293, 561–562
Data from Tables—Dillard’s, 6-2: Diagnostic Analytics Using Legislation, what-if analysis and,
103–108 Benford’s Law—Oklahoma, 465–466
2-8: Comprehensive Case: Preview 311–317 LendingClub
a Subset of Data in Excel, 6-3: Finding Duplicate Payments— communicating results, 24
Tableau Using a SQL Query— Sláinte, 317–321 data dictionary, 19, 59–60, 586
Dillard’s, 108–112 6-4: Comprehensive Case: data gathering/review, 17–19
3-1: Descriptive Analytics: Filter Sampling—Dillard’s, 321–325 Lab 1-2: Data Analytics Questions
and Reduce Data—Sláinte, 6-5: Comprehensive Case: Outlier in Managerial Accounting,
153–157 Detection—Dillard’s, 325–332 41–42
3-2: Diagnostic Analytics: Identify 7-1: Evaluate Job Costs—Sláinte, Lab 1-3: Data Analytics Questions in
Data Clusters—LendingClub, 355–367 Auditing, 42–44
157–160 7-2: Create a Balanced Scorecard Lab 2-3: Resolve Common Data
3-3: Perform a Linear Regression Dashboard—Sláinte, 367–376 Problems, 84–91
Analysis—College Scorecard, 7-3: Comprehensive Case: Analyze Lab 2-4: Generate Summary
160–166 Time Series Data—Dillard’s, Statistics, 91–95
3-4: Comprehensive Case: 377–389 Lab 3-2: Diagnostic Analytics:
Descriptive Analytics: 7-4: Comprehensive Case: Identify Data Clusters,
Generate Summary Comparing Results to a Prior 157–160
Statistics—Dillard’s, 166–169 Period—Dillard’s, 389–398 Lab 4-3: Create Dashboards,
3-5: Comprehensive Case: 7-5: Comprehensive Case: Advanced 223–229
Diagnostic Analytics: Performance Models— question identification, 17
Compare Distributions— Dillard’s, 398–403 refine results, 23–24
Dillard’s, 169–174 8-1: Create a Horizontal and Vertical regression approach, 137
3-6: Comprehensive Case: Create a Analysis Using XBRL Data— test plan, 20–22
Data Abstract and Perform S&P100, 430–437 tracking outcomes, 24
Regression Analysis— 8-2: Create Dynamic Common Liability, tax, 461–463
Dillard’s, 174–179 Size Financial Statements— Likert scale, 129
4-1: Visualize Declarative Data— S&P100, 437–441 Lim, J.-H., 123n1, 284n1
Sláinte, 212–218 8-3: Analyze Financial Statement Line charts, 194, 524–525
4-2: Perform Exploratory Analysis Ratios—S&P100, 441–444 Linear classifiers, 138–139
and Create Dashboards— 8-4: Analyze Financial Sentiment— Linear regression analysis, 160–166
Sláinte, 218–222 S&P100, 444–453 (lab)
4-3: Create Dashboards— 9-1: Descriptive Analytics: State Link prediction
LendingClub, 223–229 Sales Tax Rates, 472–475 defined, 12, 26, 144
4-4: Comprehensive Case: Visualize 9-2: Comprehensive Case: Calculate predictive analytics and, 117, 118,
Declarative Data—Dillard’s, Estimated State Sales Tax 133
229–236 Owed—Dillard’s, 475–479 Liquidity ratios, 408
4-5: Comprehensive Case: Visualize 9-3: Comprehensive Case: Calculate Live system, 248, 257
Exploratory Data—Dillard’s, Total Sales Tax Paid— Livni, Ephrat, 254
236–242 Dillard’s, 479–486 Loading of data, 67–68, 569–570
5-1: Create a Common Data Model— 9-4: Comprehensive Case: Estimate Loughran, Tim, 415n2
Oklahoma, 263–267 Sales Tax Owed by Zip Lyft, 415
5-2: Create a Dashboard Based on Code—Dillard’s and Avalara,
a Common Data Model— 486–492
Oklahoma, 267–271 9-5: Comprehensive Case: Online M
5-3: Set Up a Cloud Folder and Sales Taxes Analysis— Machine learning, 117, 118, 142–143,
Review Changes—Sláinte, Dillard’s and Avalara, 252, 287
272–274 492–497 Magic quadrant, 15, 189

ISTUDY
602 Index

Management accounting Lab 1-4: Questions about Dillard’s Analysis—Dillard’s and


application of the IMPACT model, Store Data (Comprehensive Avalara, 492–497
336–339, 347 Case), 44–47 LendingClub example, 17–19
balanced scorecard. See Balanced Lab 2-1: Request Data from IT— management accounting and, 337,
scorecard Sláinte, 77–78 345–346
cost behavior, 340–341 Lab 2-2: Prepare Data for Analysis— overview, 10
descriptive analytics, 337–338. See Sláinte, 79–83 Question Set 1: Descriptive and
also Descriptive analytics Lab 2-3: Resolve Common Data Exploratory Analysis,
diagnostic analytics, 337–338. See Problems—LendingClub, 514–519
also Diagnostic analytics 84–91 tax analytics and, 456
evaluate data quality, 346 Lab 2-5: Validate and Transform through tax data management,
identify the questions, 336, 339–341 Data—College Scorecard, 458–461
key performance indicators. See Key 95–98 transforming data, 64–67
performance indicators (KPIs) Lab 2-7: Comprehensive Case: McDonald, Bill, 415n2
Lab 1-2: Data Analytics Questions Preview Data from Tables— McKinsey Global Institute, 5
in Managerial Accounting, Dillard’s, 103–108 Measures, data quality and, 67
41–42 Lab 2-8: Comprehensive Case: Metadata, 246
Lab 7-1: Evaluate Job Costs—Sláinte, Preview a Subset of Data in Microsoft, versus Tableau, 189–191
355–367 Excel, Tableau Using a SQL Microsoft Access, 56, 63
Lab 7-2: Create a Balanced Query—Dillard’s, 108–112 Microsoft Corp., 143, 410, 411, 413–414,
Scorecard Dashboard— Lab 3-1: Descriptive Analytics: Filter 415–417
Sláinte, 367–376 and Reduce Data—Sláinte, Microsoft Excel
Lab 7-3: Comprehensive Case: 153–157 add-ins, 379
Analyze Time Series Data— Lab 4-1: Visualize Declarative Data— age analysis, 287, 289
Dillard’s, 377–389 Sláinte, 212–218 Benford’s law, predicting
Lab 7-4: Comprehensive Case: Lab 4-2: Perform Exploratory distribution, 128, 292–293
Comparing Results to a Prior Analysis and Create data analytics tools, 15–16
Period—Dillard’s, 389–398 Dashboards—Sláinte, 218–222 data retrieval, 63–64
Lab 7-5: Comprehensive Case: Lab 4-3: Create Dashboards— flat files and, 56, 57–58, 70, 257
Advanced Performance LendingClub, 223–229 formatting, income statements using
Models—Dillard’s, 398–403 Lab 4-4: Comprehensive Case: SUM(), 534–541
master the data and, 337, 345–346 Visualize Declarative Data— Get and Transform tool, 565–566
predictive analytics, 136, 337–338. Dillard’s, 229–236 Internal Data Model, 80
See also Predictive analytics Lab 4-5: Comprehensive Case: Lab 1-0: How to Complete Labs,
prescriptive analytics, 338. See also Visualize Exploratory Data— 36–39
Prescriptive analytics Dillard’s, 236–242 Lab 1-3: Data Analytics Questions in
profiling example, 126–127 Lab 5-1: Create a Common Data Auditing, 42–44
role of, 7 Model—Oklahoma, 263–267 Lab 2-2: Prepare Data for Analysis—
See also Accounting Lab 6-3: Finding Duplicate Sláinte, 79–83
Manipulation of data, 14 Payments—Sláinte, 317–321 Lab 2-3: Resolve Common Data
Mapping data, 250–251 Lab 8-1: Create a Horizontal and Problems—LendingClub,
Marketing KPIs, 345 Vertical Analysis Using XBRL 84–91
Marr, Bernard, 4n2, 344 Data—S&P100, 430–437 Lab 2-4: Generate Summary
Master the data Lab 8-3: Analyze Financial Statistics—LendingClub,
audit data analytics and, 284–286 Statement Ratios—S&P100, 91–95
defined, 70 441–444 Lab 2-5: Validate and Transform
extracting. See Extracting, Lab 9-3: Comprehensive Case: Data—College Scorecard,
transforming, loading (ETL) Calculate Total Sales Tax 95–98
Lab 1-1: Data Analytics Questions in Paid—Dillard’s, 479–486 Lab 2-8: Comprehensive Case:
Financial Accounting, 39–40 Lab 9-4: Comprehensive Case: Preview a Subset of Data
Lab 1-2: Data Analytics Questions Estimate Sales Tax Owed in Excel, Tableau Using
in Managerial Accounting, by Zip Code—Dillard’s and a SQL Query—Dillard’s,
41–42 Avalara, 486–492 108–112
Lab 1-3: Data Analytics Questions in Lab 9-5: Comprehensive Case: Lab 6-3: Finding Duplicate
Auditing, 42–44 Online Sales Taxes Payments—Sláinte, 317–321

ISTUDY
Index 603

Microsoft Excel—Cont. Microsoft Navision, 14 N


Lab 9-1: Descriptive Analytics: State Microsoft OneDrive Nasdaq, 405
Sales Tax Rates, 472–475 electronic working papers, 256 New York Stock Exchange, 405
Lab 9-2: Comprehensive Case: Lab 1-0: How to Complete Labs, Nominal data, 187, 205
Calculate Estimated State Sales 36–39 Normal distribution, 188, 205, 529–530
Tax Owed—Dillard’s, 475–479 Lab 5-3: Set Up a Cloud Folder and Normalized relational database, 56–57
Lab 9-3: Comprehensive Case: Review Changes—Sláinte, Norton, David P., 342
Calculate Total Sales Tax 272–274 Norwegian Consumer Council, 53
Paid—Dillard’s, 479–486 Microsoft SQL Server Null hypothesis, 131, 132, 145
Lab 9-4: Comprehensive Case: enterprise level data and, 56 Number data types, SQL WHERE
Estimate Sales Tax Owed Lab 2-6: Comprehensive Case: clauses, 549
by Zip Code—Dillard’s and Build Relationships among Numbers, data quality and, 67
Avalara, 486–492 Database Tables—Dillard’s,
Lab 9-5: Comprehensive Case: 98–102
Online Sales Taxes Analysis— Lab 2-7: Comprehensive Case: O
Dillard’s and Avalara, Preview Data from Tables— Objectives, balanced scorecard and,
492–497 Dillard’s, 103–108 342–343, 346
Power Pivot tool, 385 Lab 2-8: Comprehensive Case: Obtain data, 61–62, 77–78
Question Set 1: Descriptive and Preview a Subset of Data in Oestreich, J. L., 15
Exploratory Analysis, Excel, Tableau Using a SQL Office 365, 256
514–519 Query—Dillard’s, 108–112 Office of National Statistics, 195
summary statistics, 91–95 Lab 4-4: Comprehensive Case: Office.com, 37
versus Tableau, 189–191 Visualize Declarative OkCupid, 53
Tableau and, 578–585 Data—Dillard’s, 229–236 Oklahoma, state of
tutorial, 534–543 Lab 7-3: Comprehensive Case: Lab 5-1: Create a Common Data
Using the Excel Data Analysis Analyze Time Series Model, 263–267
Toolpak, 544–545 Data—Dillard’s, 377–389 Lab 5-2: Create a Dashboard Based
VLookup function, 63, 542–543 Lab 7-4: Comprehensive Case: on a Common Data Model,
Microsoft Excel PivotTables Comparing Results to a 267–271
address and refine results, 23–24 Prior Period—Dillard’s, Lab 6-1: Evaluate Trends and
Lab 2-2: Prepare Data for 389–398 Outliers, 304–311
Analysis—Sláinte, 79–83 Lab 7-5: Comprehensive Case: Lab 6-2: Diagnostic Analytics Using
Lab 6-5: Comprehensive Case: Advanced Performance Benford’s Law, 311–317
Outlier Detection—Dillard’s, Models—Dillard’s, 398–403 Oklahoma Purchase Card, data
325–332 Lab 9-2: Comprehensive Case: dictionary, 587
performing the test plan using, Calculate Estimated State OneDrive
20–22 Sales Tax Owed—Dillard’s, electronic working papers, 256
Q 3.1: By Looking at Line Charts 475–479 Lab 5-3: Set Up a Cloud Folder and
for 2014 and 2015, Does the Lab 9-3: Comprehensive Case: Review Changes—Sláinte,
Average Percentage of Sales Calculate Total Sales Tax 272–274
Returned in 2014 Seem to Paid—Dillard’s, 479–486 Online sales, analyzing, 521–523
Be Predictive of Returns in Lab 9-5: Comprehensive Case: Open database connectivity
2015?, 524–525 Online Sales Taxes Analysis— (ODBC), 16
Q 3.2: Using Regression, Can We Dillard’s and Avalara, Open Science Framework, 53
Predict Future Returns as a 492–497 Operational KPIs, 344
Percentage of Sales Based overview, 63–64 Optimization of costs and revenue, 337
on Historical Transactions?, See also SQL clauses Oracle, 248, 420
526–527 Microsoft Teams, 255 Oracle RDBMS, 56
Q 3.3: What Else Can You Determine Microsoft Track, 15–16, 36 ORDER BY, SQL clauses, 551
about the Percentage of Middle value, describing sample by, Order-to-cash (O2C) process, 500–506
Returned Sales through 528–529 Order to cash subledger, 249–250
Predictive Analysis?, 527 Mission, organizational, 345 Ordinal data, 187, 205
tools for, 540–541 Myrstad, Finn, 53 Organization, of written
Microsoft Excel Query Editor, 80–81, MySql, 56 communication, 202–203
93, 106–107 (lab) Mystery ratios, 440 (lab) Outcomes, tracking, 13, 24, 288, 339

ISTUDY
604 Index

OUTER JOIN, SQL clause, 293–294 by Zip Code—Dillard’s and Excel’s Get and Transform,
Outlier detection (lab), 304–311, Avalara, 486–492 565–566
325–332 Lab 9-5: Comprehensive Case: loading data, 569–570
Overfitting, 140, 145 Online Sales Taxes Analysis— return to window after closing, 570
Overlap method, 415 Dillard’s and Avalara, tutorial, 564–570
492–497 uses, 15–16
Performance metrics, 339, 348. See worksheet failure, 569
P also Key performance indicators Pre-pruning, of decision trees, 138
p-Value, 132, 135, 141, 531 (KPIs) Predictive analytics, 133–141
Parameters, versus statistics, 528 Peters, G. F., 123n1, 284n1 artificial intelligence, 296
Park, J., 123n1, 284n1 Pie charts, 192–194, 197 audit data analytics and, 286,
Payments, procure-to-pay (P2P) process, PivotTables 294–296
506–511 address and refine results, 23–24 auditing, 136–137
PCAOB, 252 Lab 2-2: Prepare Data for classification, 118, 133, 137–140,
Perform test plan Analysis—Sláinte, 79–83 287, 295
audit data analytics and, 286–288 Lab 6-5: Comprehensive Case: defined, 116, 118, 145, 297
IMPACT cycle and, 10–13, 20–22, Outlier Detection—Dillard’s, examples of, 116–117
116–119 325–332 financial statements, 406, 410–412
LendingClub, 20–22 performing the test plan using, link prediction, 117, 118, 133
management accounting and, 20–22 management accounting, 136,
337–338 Q 3.1: By Looking at Line Charts 337–338
Perform the analysis for 2014 and 2015, Does the overfitting data, 140, 145
Lab 1-1: Data Analytics Questions in Average Percentage of Sales overview of, 133–134
Financial Accounting, 39–40 Returned in 2014 Seem to probability, 287, 295
Lab 2-2: Prepare Data for Be Predictive of Returns in Q 3.1: By Looking at Line Charts
Analysis—Sláinte, 79–83 2015?, 524–525 for 2014 and 2015, Does the
Lab 2-7: Comprehensive Case: Q 3.2: Using Regression, Can We Average Percentage of Sales
Preview Data from Tables— Predict Future Returns as a Returned in 2014 Seem to
Dillard’s, 103–108 Percentage of Sales Based Be Predictive of Returns in
Lab 2-8: Comprehensive Case: on Historical Transactions?, 2015?, 524–525
Preview a Subset of Data in 526–527 Q 3.2: Using Regression, Can We
Excel, Tableau Using a SQL Q 3.3: What Else Can You Determine Predict Future Returns as a
Query—Dillard’s, 108–112 about the Percentage of Percentage of Sales Based
Lab 3-3: Perform a Linear Returned Sales through on Historical Transactions?,
Regression Analysis—College Predictive Analysis?, 527 526–527
Scorecard, 160–166 tools for, 540–541 Q 3.3: What Else Can You Determine
Lab 4-2: Perform Exploratory Poisson distribution, 530 about the Percentage of
Analysis and Create Population, versus sample, 528 Returned Sales through
Dashboards—Sláinte, 218–222 Post-pruning, decision tree, 138 Predictive Analysis?, 527
Lab 6-3: Finding Duplicate PostGreSQL, 56 regression, 118, 134–137,
Payments—Sláinte, 317–321 Power Automate, 15–16 287, 295
Lab 7-2: Create a Balanced Power BI sentiment analysis, 287, 295–296
Scorecard Dashboard— as an analytics tool, 15–16, 481 tax analytics and, 457–458
Sláinte, 367–376 ask a question, 577–578 Predictor variables, 11, 26
Lab 9-1: Descriptive Analytics: State choosing mode, 573–574 Prescriptive analytics
Sales Tax Rates, 472–475 defined, 16 applied statistics, 287, 296
Lab 9-2: Comprehensive Case: opening, startup screen, 572–573 artificial intelligence, 117, 118,
Calculate Estimated State Sales SQL and, 63 142–143, 286, 287, 296
Tax Owed—Dillard’s, 475–479 tutorial, 572–579 audit data analytics and, 287
Lab 9-3: Comprehensive Case: visualizations/fields/values, 190, decision support systems, 117, 118,
Calculate Total Sales Tax 574–577 141–142, 144, 296
Paid—Dillard’s, 479–486 Power Pivot tool, 385 defined, 117, 145, 297
Lab 9-4: Comprehensive Case: Power Query financial statements, 406, 412–413
Estimate Sales Tax Owed editing data, 567–569 management accounting, 338

ISTUDY
Index 605

Prescriptive analytics—Cont. Question Set 1: Descriptive and Be Predictive of Returns in


tax analytics and, 457, 458 Exploratory Analysis 2015?, 524–525
what-if analysis, 287 1.1: Compare the Percentage of 3.2: Using Regression, Can We Predict
Primary keys, 57–58, 70 Returned Sales across Future Returns as a Percentage
Privacy of data, 53, 68 Months, States, and Online of Sales Based on Historical
Probability, predictive analytics and, versus In-Person Transactions, Transactions?, 526–527
287, 295 514–518 3.3: What Else Can You Determine
Probability distributions, 529–530 1.2: What Else Can You Determine about the Percentage of
Process mining, 286, 297 about the Percentage of Returned Sales through
Procter and Gamble, 5, 27 Returned Sales through Predictive Analysis?, 527
Procure-to-pay (P2P) process, 506–511 Descriptive Analysis?, 518–519 Questions, IMPACT model and, 9–10,
Procure to pay subledger, 250, 251 Question Set 1: Order-to-Cash (O2C) 17, 284, 336. See also Identify the
Production system, 248, 257 1.1: What Is the Total Revenue questions
Profiling and Balance in Accounts Quick ratio, 408
auditing example, 127–128 Receivable?, 500–502
defined, 11, 26, 123, 145 1.2: How Efficiently Is the Company
diagnostic analytics, 117, 118, Collecting Cash?, 503–504 R
123–128 1.3: Is the Delivery Process R, 251
digital dashboards, 125 Following the Expected Rank-ordered bar chart, 197
internal audit example, 127–128 Procedure?, 504–505 Ratio analysis, 407, 408–409, 423,
IRS and, 461 1.4: What Else Can You Determine 441–444 (lab)
management accounting example, about the O2C Process?, Ratio data, 187, 205–206
126–127 505–506 R&D tax credit, 457–458, 465–466
steps, 125–126 process, 500 Real-time financial reporting, 420
structured data, 123 Question Set 2: Diagnostic Analytics- Receivable turnover ratio, 409
uses for, 123–125 Hypothesis Testing Red Cross, 335
Profit margin on sales ratio, 409 2.1: Is the Percentage of Sales Redundancy, of data, 57
Profitability ratios, 409 Returned Significantly Higher Regression
Proportion, quantitative data and, 116, in January after the Holiday accounting and, 134–137
187, 205 Season?, 519–521 auditing, 136–137
Provost, Foster, 11n16 2.2: How Do the Percentages of cost accounting and, 136
Pruning, decision trees, 138–139 Returned Sales for Holiday/ cost behavior and, 340–341
Public Company Accounting Oversight Non-Holiday Differ for Online defined, 11, 26, 117, 135, 145
Board (PCAOB), 252 Transactions and across Lab 3-3: Perform a Linear
Purchasing cycle processes, 506–511 Different States?, 521–523 Regression Analysis—College
PwC, 4–5, 245, 255–256, 257, 462 2.3: What Else Can You Determine Scorecard, 160–166
Python, 14, 251 about the Percentage of Lab 3-6: Comprehensive Case:
Returned Sales through Create a Data Abstract
Diagnostic Analysis?, 523–524 and Perform Regression
Q Question Set 2: Procure-To-Pay (P2P) Analysis—Dillard’s, 174–179
Qualified research expenditure (QRE), 2.1: Is the Company Missing Out on managerial accounting and, 136
466 Discounts by Paying Late?, predictive analytics and, 118,
Qualitative data 506–508 134–137, 287, 295
charts for, 192–194 2.2: How Long Is the Company Taking process, 134
defined, 205 to Pay Invoices?, 509–510 Question 3.2: Using Regression, Can
versus quantitative, 186–188 2.3: Are There Any Erroneous We Predict Future Returns as
Quality, of data, 66–67, 419–420 Payments?, 510–511 a Percentage of Sales Based
Qualtrics, 528 2.4: What Else Can You Determine on Historical Transactions?,
Quantitative data about the P2P Process?, 511 526–527
charts for, 194–195 Question Set 3: Predictive Analytics statistical output from interpreting,
defined, 205 3.1: By Looking at Line Charts for 533
proportion, 116, 187, 205 2014 and 2015, Does the time series analysis, 137, 145,
versus qualitative, 186–188 Average Percentage of Sales 377–389 (lab)
Quartiles, 124–125, 195, 290 Returned in 2014 Seem to uses for, 133–134

ISTUDY
606 Index

Relational database Lab 9-5: Comprehensive Case: Lab 2-2: Prepare Data for Analysis,
defined, 56, 70 Online Sales Taxes 79–83
versus flat file, 57–58 Analysis—Dillard’s and Lab 3-1: Descriptive Analytics: Filter
relationships and, 56–59 Avalara, 492–497 and Reduce Data, 153–157
tables and, 57–58 Sallam, R. L. C., 15 Lab 4-1: Visualize Declarative Data,
Relational database management Sample 212–218
systems (RDBMS), 56 describing, 528–529 Lab 4-2: Perform Exploratory
Relationships, relational databases and, versus population, 528 Analysis and Create
56–59 Sampling Dashboards, 218–222
Relevant costs, 339 descriptive analytics and, 287, Lab 5-3: Set Up a Cloud Folder and
Remote audit work, 255–256 289–290 Review Changes, 272–274
Resnick, B., 53n1 Lab 6-4: Comprehensive Case: Lab 5-4: Identify Audit Data
Response variables, 10–11, 26 Sampling—Dillard’s, Requirements, 275–277
Return on assets ratio, 409 321–325 Lab 6-3: Finding Duplicate
Return on equity (ROE), 409 SAP, 14, 248, 253, 420 Payments, 317–321
Returns, estimating sales. See Scale, charting data and, 201 Lab 7-1: Evaluate Job Costs,
Estimating sales returns Scatter plots, 195, 341 355–367
Revenue analytics skill, 337 Schema, database, 56, 58, 59 Lab 7-2: Create a Balanced
Revenue optimization, 337, 355–367 Scorecard, balanced. See Balanced Scorecard Dashboard,
Revision, of written communication, scorecard 367–376
204 Scripting language, 251 Snapchat, 27
Revlon, 254 Scrubbing data, 10, 14, 53 Snow, John, 181
Richardson, J. L., 15 Securities and Exchange Commission Social media, 7–8
Richardson, V. J., 123n1, 284n1 (SEC), 6, 122, 417 Software needs
RIGHT JOIN, SQL clause, 293, 562 Security, of data, 53 exact and fuzzy matching, 287,
Risk scores, 20, 22 SELECT, SQL clauses, 546 293–294
Robert Half Associates, 64 SELECT FROM, SQL clauses, sampling, 287, 289–290
Robotics process automation (RPA), 3, 547–548 sorting, 287, 289
16, 246 SELECT FROM WHERE, SQL storing data, 56–57
R.R. Donnelley, 419 clauses, 550–551 Software translators, 248–249
Sensitivity analysis, 412 Solvency ratios, 409
Sentiment analysis Sorkin, Andrew Ross, 254
S Lab 8-4: Analyze Financial Sorting, descriptive analytics and, 287,
Sage, 14 Sentiment—S&P100, 289
Sales cycle process, 500–506 444–453 S&P100
Sales returns, estimating. See Estimating predictive analytics and, 287, data dictionary, 587
sales returns 295–296 Lab 8-1: Create a Horizontal and
Sales tax liability text mining and, 415–417 Vertical Analysis Using
evaluating, 462 Sequence check, 287, 294 XBRL Data, 430–437
Lab 9-1: Descriptive Analytics: Shared folders (lab), 272–274 Lab 8-2: Create Dynamic Common
State Sales Tax Rates, Similarity matching Size Financial Statements,
472–475 defined, 11, 26, 145 437–441
Lab 9-2: Comprehensive Case: diagnostic analytics, 117, 118. See Lab 8-3: Analyze Financial
Calculate Estimated State also Diagnostic analytics Statement Ratios, 441–444
Sales Tax Owed—Dillard’s, supervised approach, 133, 145 Lab 8-4: Analyze Financial
475–479 Simsion, G. C., 57n4 Sentiment, 444–453
Lab 9-3: Comprehensive Case: Singer, N., 53n2 Sparklines, 413–414, 423
Calculate Total Sales Tax Singleton, T., 61n5 Spread, describing, 529
Paid—Dillard’s, 479–486 Slack, 255 Spreadsheet software, 15–16
Lab 9-4: Comprehensive Case: Sláinte SQL clauses
Estimate Sales Tax Owed data dictionary, 586 FROM, 546–547
by Zip Code—Dillard’s and Lab 2-1: Request Data from IT, aggregates/aliases, expand SELECT,
Avalara, 486–492 77–78 552–553

ISTUDY
Index 607

SQL clauses—Cont. Storey, V. C., 4n4 Lab 5-2: Create a Dashboard Based
example queries, ORDER BY, Strategy Management Group Company, on a Common Data Model—
551–552 335, 343 Oklahoma, 267–271
Full Outer Join, 293–294 Stratification, diagnostic analytics, 287, versus Microsoft, 189–191
Get and Transform tool, 565–566 294 overview, 16–17
GROUP BY, 554–555 Structured data Question Set 1: Descriptive and
HAVING, 555–556 data reduction and, 120 Exploratory Analysis,
INNER JOIN, 293, 557–558, defined, 4, 26, 145 514–519
560–561 profiling and, 123 tutorial, 582–585
JOIN, 557–558 Structured Query Language (SQL). See Tableau Workbook
LEFT JOIN, 293, 561–562 Microsoft SQL Server Question Set 1: Order-to-Cash
ORDER BY, 551 Sullivan, G. M., 141n6 (O2C), 500–506
parentheses, joining tables, 558 Summary statistics Question Set 2: Procure-to-Pay,
RIGHT JOIN, 293, 562 defined, 117, 145 506–511
SELECT, 546 descriptive analytics, 116, 119–120, Takeda, C., 136n4
SELECT FROM, 547–548 287, 289 Tapadinhas, J., 15
SELECT FROM WHERE, 550–551 Lab 2-4: Generate Summary Target, 133
tutorial, 546–559, 560–562 Statistics—LendingClub, Tax analytics, 454–497
using data from more than one table, 91–95 compliance and liability, 461–463
556–558 Lab 3-4: Comprehensive Case: data management and, 458–461
WHERE, 548–549 Descriptive Analytics: diagnostic/descriptive, 456, 457
See also Microsoft SQL Server Generate Summary IRS and, 456, 461
SQL (Structured Query Language). See Statistics—Dillard’s, 166–169 Lab 9-1: Descriptive Analytics: State
Microsoft SQL Server versus visualizations, 183–184 Sales Tax Rates, 472–475
SQLite, 56 Sunburst diagram, 414, 423 Lab 9-2: Comprehensive Case:
Square Payments, 120 Supervised approach, predictive Calculate Estimated State
Stacked bar chart, 192–193, 196, 198 analytics, 133, 145 Sales Tax Owed—Dillard’s,
Standard normal distribution, 188, 206 Supply chain management system 475–479
Standardization, 62, 188, 206 (SCM), 54, 70 Lab 9-3: Comprehensive Case:
Standardized metrics, 419–420, 421, Support vector machine, 139–140, 145 Calculate Total Sales Tax
423 SurveyMonkey, 528 Paid—Dillard’s, 479–486
Standards, audit data, 62, 249–251, 252, Sweet spot, 140 Lab 9-4: Comprehensive Case:
256, 264 Symbol maps, 194 Estimate Sales Tax Owed
Statistical significance, 131–132 Systems translator software, 248–249, by Zip Code—Dillard’s and
Statistical testing, 531 257 Avalara, 486–492
Statistics Lab 9-5: Comprehensive Case:
applied, 287, 296 Online Sales Taxes
describing the sample, 528–529 T Analysis—Dillard’s and
describing the spread, 529 t-Test Avalara, 492–497
hypothesis testing, 530–531 defined, 297 predictive/prescriptive, 457–458
output from sample t-test of a diagnostic analytics, 287, 290–291 questions addressed using, 456–457
difference of means of two for equal means, 131 for tax planning, 464–466
groups, 532 interpreting output from sample, using the IMPACT model, 456–458
parameters versus, 528 532 visualizations and, 461–463
population versus sample, 528 two-sample, 131–132 Tax compliance, 461–463
probability distributions, 529–530 Table attributes, 57–58 Tax cost, 462–463
regression, interpreting the statistical Tableau Prep, 16–17, 38, 578–581 Tax credits, what-if analysis and,
output from, 533 Tableau Public, 16–17 465–466
statistical testing, 531 Tableau software Tax Cuts and Jobs Act of 2017,
tutorial, 528–533 as an analytics tool, 481 460, 467
versus visualizations, 183–184 Lab 4-2: Perform Exploratory Tax data management, 458–461
StockSnips, 405 Analysis and Create Tax data mart, 459, 467
Storage of data, 56–57 Dashboards—Sláinte, 218–222 Tax efficiency/effectiveness, 463

ISTUDY
608 Index

Tax liability, 461–463 Unfavorable variance, 340 Lab 5-2: Create a Dashboard Based
Tax planning, 464–466, 467 UNICODE, 67 on a Common Data
Tax risk, 463 Unified Modeling Language (UML), Model—Oklahoma, 267–271
Tax sustainability, 463 56, 255 Lab 9-1: Descriptive Analytics: State
Taxes, data analytics and, 8 Uniform distribution, 530 Sales Tax Rates, 472–475
Taxonomy, 417–418, 423 Unique identifier, 57 normal distribution, 188
TeamMate, 255, 288, 296 U.S. GAAP Financial Reporting preference over text, 184–185
Teradata, 56 Taxonomy, 417 purpose of, 185–192
Tesla, 455 U.S. Securities and Exchange qualitative versus quantitative data,
Test data, 138, 145 Commission (SEC), 6, 122, 417 186–188
Test plan Unix commands, 14 Question 3.1: By Looking at Line
audit data analytics, 286–288 Unstructured data, 4, 26 Charts for 2014 and 2015,
IMPACT cycle and, 10–13, 20–22, Unsupervised approach, 129, 145 Does the Average Percentage
116–119 of Sales Returned in 2014
LendingClub, 20–22 Seem to Be Predictive of
management accounting and, V Returns in 2015?, 524–525
337–338 Validation, of data, 53, 64–65, 82–83 Question Set 1: Descriptive and
Text, versus data visualization, 184–185 Value, describing sample by middle or Exploratory Analysis,
Text data types, SQL WHERE clauses, typical, 528–529 514–519
549 Values, organizational, 345 relative size of accounts, 414
Text mining, 415–417 Variability of data, describing, 529 showing trends, 413
Thomas, S., 136n4 Variables sparklines, 413–414, 423
Time series analysis, 137, 145, 377–389 dependent/independent, 10–11, 26 versus statistics, 183–184
(lab) dummy, 135, 144 sunburst diagrams, 414, 423
Times interest earned ratio, 409 explanatory, 11, 26 tax analytics and, 461–463
Tolerable misstatements, 290 predictor, 11 tools for, 189–191
Tone, of written communication, response, 10–11, 26 use of written communication,
203–204 Variance analysis, 116, 127, 339–340, 202–204
Total cost, 339 355–367 See also Communicating results
Tracking outcomes, 13, 24, 288, 339 Vertical financial statement analysis, VLookup function, Excel, 63, 542–543
Trade-off, 140 407–408, 411, 423, 430–437
Training data, classification of, 138, 145 Vision, organizational, 345
Transforming data, 64–67 Visualization of data W
Translator software, 248–249 Anscombe’s Quartet, 183–184 Walmart, 127, 129, 345
TransUnion, 27 categorical data, 186, 205 Walton College of Business, 341–342
Tree maps, 194 charting. See Charting data Warehouse, for data, 248, 257, 459, 467
Trend analysis, 411 declarative, 188–191, 205 Wayfair decision, 462
Trendlines, visualizing, 413–414 exploratory, 188–191, 205 Web browser
True negative alarm, 254–255 heat maps and, 181–182, 194, 414 Lab 1-0: How to Complete Labs,
True positive alarm, 254–255 Lab 4-1: Visualize Declarative 36–39
TurboTax, 141, 296 Data—Sláinte, 212–218 Lab 1-1: Data Analytics Questions in
Turnover, of inventory, 246–247, 409 Lab 4-2: Perform Exploratory Financial Accounting, 39–40
Turnover ratios, 408–409 Analysis and Create Lab 1-3: Data Analytics Questions in
Twitter, 7, 461 Dashboards—Sláinte, Auditing, 42–44
Two-sample t-test, 131–132 218–222 Lab 1-4: Questions about Dillard’s
Typical value, describing samples by, Lab 4-3: Create Dashboards— Store Data (Comprehensive
528–529 LendingClub, 223–229 Case), 44–47
Lab 4-4: Comprehensive Case: Lab 5-3: Set Up a Cloud Folder and
Visualize Declarative Review Changes—Sláinte,
U Data—Dillard’s, 229–236 272–274
Uber, 415 Lab 4-5: Comprehensive Case: Lab 5-4: Identify Audit Data
UML diagrams, 56, 255 Visualize Exploratory Requirements—Sláinte,
Underfitting data, 140, 145 Data—Dillard’s, 236–242 275–277

ISTUDY
Index 609

Web browser—Cont. Words, use of in communicating results, XBRLAnalyst, 420


Lab 8-3: Analyze Financial Statement 202–204 Xero, 255
Ratios—S&P100, 441–444 Workflow, auditing, 255–256 XML, 417
What-if scenario analysis, 287, 455, Working capital ratio, 408
464–466, 467 Working papers, 255–256, 272–274
WHERE, SQL clauses, 548–549 Write-off classification, 137–138 Z
Whisker plots, 125, 195 Writing for Computer Science, 202 Z-score
White, S., 68n converting observations to, 123
Whiteoak, Hannah, 185n3 diagnostic analytics, 123, 124, 287,
Wilkins, Ed, 291 X 290, 291
Witt, G. C., 57n4 XBRL. See eXtensible Business profiling, 123–124, 128
Word clouds, 194 Reporting Language (XBRL) standardizing distributions with, 188
Word frequency, 415 XBRL-Global Ledger, 420–422, 423 Zobel, Justin, 202, 202n5, 204n6

ISTUDY
ISTUDY
ISTUDY
ISTUDY
ISTUDY
ISTUDY

You might also like