ACL 103
ACL 103
ACL 10
Data analysis can require several steps before the insights you need are surfaced. During the data analysis
process, grouping data helps to identify patterns and relationships not readily apparent in ungrouped data, simplifies
data interpretation, and helps to identify outliers. Combining data allows you to gather information from different
data sources that can add additional context and a broader perspective to your data analysis. Sorting data, often a
critical step during a data analysis, can optimize data readability, allows natural patterns to be easily visualized, and
can help to move data analysis forward.
Grouping data creates a high-level view that can help you identify patterns, trends, irregularities, or
outliers. In this section, we'll look at five commands to group data in
Analytics: CLASSIFY, SUMMARIZE, CROSSTAB, STRATIFY, and AGE.
Grouping your data is a useful technique you can use to help you understand what's happening in your
organization. For example, you may want to gather information on products sold last quarter, or create a
report of employee benefits usage to understand which benefits are the most popular. Let's check out how
the CLASSIFY command can help us do this kind of grouping.
The CLASSIFY command groups multiple occurrences of the same field value, and counts the number of
records in each group. Only character and numeric fields can be classified.
When classifying, you can optionally subtotal one or more associated numeric fields. The Include
Statistics for Subtotal Fields option allows you to calculate average, minimum, and maximum values for
subtotaled fields. The results of all calculations are broken down by group in the output results
Example
To find the total number of transactions per customer, you can run CLASSIFY on the CustomerNumber field.
The CustomerTransactions table contains ten records. Some CustomerNumber values are unique, and some are
duplicates.
In the output results, CustomerNumber values are classified into four unique groups. The Count tells you
how many records (transactions), are associated with each customer number group.
If the Transactions table included a TransactionAmount field, you could subtotal that field to find the
total transaction amount for each customer
The CLASSIFY command is found in the Analytics menu under Analyze > Classify.
The r_Sales2024ByCustomer classified table should contain 63 records (one for each customer).
1. In the r_Top10Customers table, highlight the c_TotalAmtRecalc column by clicking the field name.
2. Right-click and select Total Selected Data.
The sum of the c_TotalAmtRecalc field is displayed in the status bar.
Discover that the top 10 customers account for $1,815,766.24 of total sales.
Discover the total percentage of sales for the top 10 customers is 31.40%.
The situation
You need to determine how much of the AR balance from the end of 2023 was still outstanding by the end
of 2024. To do so, you must determine how much was paid by each customer in 2024 and compare this
against the 2023 balances.
Your task
Calculate the total payments by customer for 2023 sales and send the results to a new Analytics table. All
invoice numbers lower than 18181 occurred in 2023.
Run CLASSIFY and filter for payments made in 2024 for 2023 sales
1. Open the p_ArPmt2024Actual table.
2. Select Analyze > Classify from the Analytics menu.
The Classify window opens.
3. Select CustNum as the field to classify and PaidAmt as the field to subtotal.
4. Click If... and add a filter to only include invoices less than 18181 using the syntax InvceNum < "18181" ,
then click OK.
5. Send the result to a new table named r_2024PmtsFor2023Sales:
1. Click the Output tab.
2. In To, select File.
3. For File Type, select Analytics Table.
4. In the Name... textbox, enter "r_2024PmtsFor2023Sales".
5. Click OK.
The new table opens.
The r_2024PmtsFor2023Sales table contains 56 payments for payments made in 2024 for 2023 sales.
1. In the r_2024PmtsFor2023Sales table, highlight the PaidAmt column by clicking the field name.
2. Right-click on the highlighted column and select Total Selected Data.
The sum of the PaidAmt field is displayed in the status bar.
The total amount of payments made in 2024 for 2023 invoices is $851,721.80.
1. The CLASSIFY command groups records for character or numeric fields that have identical values and
counts the number of records in each group. You can subtotal numeric fields associated with the classified
field and calculate statistics on any subtotaled fields.
2. By default, the CLASSIFY command outputs results to the Analytics display area. Alternatively, results
can be sent to an Analytics table (File), a graph, or to a default printer.
When grouping data, you may have to answer complex questions such as "For each vendor type, what's the
total number of purchases and the total purchase value?." You'd need to calculate the total number of
purchases and the total purchase value from each vendor by order type. The SUMMARIZE command
takes grouping further than the CLASSIFY command by identifying identical values in more than 1 field,
as well as some additional functionality. Let's learn more about how the SUMMARIZE command extends
grouping capabilities.
The SUMMARIZE command groups the records in a table based on identical values in one or more key
fields and counts the number of records in each group. Key fields can be character, numeric, or datetime
data types. Said another way, we're identifying unique values in our data, across one or more fields, and
grouping them together.
If you summarize by more than one key field, you create nested summarized groups in the output results.
The order in which you select the key fields dictates the nesting hierarchy. The records are summarized by
the first field you select, then each of those groups is summarized by the second field you select, and so on.
| NOTE
Reversing the order in which you select two summarize key fields can give very different results.
Example
You want to figure out which customers are frequent shoppers. To do this, you summarize a transactions table on
the customer number and transaction date fields to find the total number of transactions for each customer for each
date that the customer had transactions.
In the Customer Transactions table, there are ten values in the Customer Number and Invoice Date fields. Some
combinations of customer number and date are unique, and some are identical.
In the output results, after summarizing, the customer number-date combinations are grouped into seven
unique groups. The Count column tells you how many records (in this case, transactions) are in each
group.
Based on the summarized data, customer 518008 is the most frequent shopper, followed by customer
795401.
The SUMMARIZE command also allows you to select additional fields as complementary information. When fields
are selected from the Other Fields list, only the first instance within each grouping will be included in the results.
For example, if you summarize a table on customer number, an appropriate “other field” could be customer name.
The customer name should be identical for all records with the same customer number.
The SUMMARIZE command can process either sorted or unsorted data. The Presort option allows you to
include initial sorting of the data with summarizing. The Presort option is selected by default;
deselecting Presort may not yield the expected results. Applying the Presort option is recommended as a
best practice when using SUMMARIZE for most use cases and yields quicker output results.
Example
Consider this table that contains customer transaction data. Let's summarize on the following fields:
Flip the cards below to see the differences in SUMMARIZE results depending on whether Presort is selected.
The SUMMARIZE command is found in the Analytics menu under Analyze > Summarize.
The SUMMARIZE command outputs unique groups based on multiple fields and, at minimum, counts the
number of group members for each group. The order in which you selected the key fields is the order in
which the columns appear in the output results. The results of any additional options (e.g. subtotal fields
or Other Fields) applied to the SUMMARIZE command are also displayed.
| IMPORTANT
More than one group for each set of identical values can be found in SUMMARIZE command output if
the Presort option is deselected. Depending on the context, more than one group for each set of identical
values, or identical combination of values, can defeat the purpose of summarizing.
| NOTE
It's common to output SUMMARIZE results to Screen, but remember that if the output table contains a
large number of records, it's faster to save the results to File.
Your task
Identify the five days that had the largest sales in 2024, then determine the combined cumulative amount
for those five days.
Identify the dates with the five largest total payments in 2024
1. In the r_ArPmt2024ByDate table, highlight the PaidAmt column by clicking the field name.
2. Right-click the highlighted column and select Quick Sort Descending.
The table displays results with the highest amount at the top.
Calculate the combined cumulative amount for the top five dates
1. Ensure the Quick Sort from the previous step is still active.
2. Click on the cell containing the fifth largest PaidAmt (66,465.11) to highlight it.
3. Right-click on the highlighted cell and select Quick Filter > Greater Than Or Equal To.
The r_ArPmt2024ByDate table is filtered to display the five largest paid amounts.
4. With the filter applied, select Analyze > Total from the Analytics menu and totall the PaidAmt field.
The total sales earnings of the five biggest sales days is $388,246.29
In this topic, we’ve talked about how the SUMMARIZE command works, how to interpret command
output, and practiced using the SUMMARIZE command to group data.
1. The SUMMARIZE groups records for character, numeric, or datetime fields that have the same value
found within multiple fields. Additional options such as subtotaling numeric fields, including "other
fields", and Presort can also be applied to the SUMMARIZE command.
2. Since the SUMMARIZE command can group multiple fields, output results can differ greatly based on
whether Presort is applied. Applying Presort is recommended as a best practice for most use cases.
3. The SUMMARIZE command outputs unique groups based on multiple fields and, at minimum, counts the
number of group members for each group.
Although the CLASSIFY and SUMMARIZE commands are similar, there are important differences that can help
you decide which command is best suited for your grouping scenario.
1. 1
The number of fields that can be grouped.CLASSIFY can only be used on one field at a time but
SUMMARIZE can group multiple fields.
2. 2
The type of fields that can be grouped. CLASSIFY can group character and numeric fields, but cannot
group datetime fields. SUMMARIZE can group character, numeric, and datetime fields.
3. 3
Additional options. SUMMARIZE has an Other Fieldsoption that allows you to select additional fields
as complementary information, while CLASSIFY does not. SUMMARIZE also has more statistical output
options than CLASSIFY.
4. 4
Command output. CLASSIFY's results, by default, are output to Screen, but it's possible to output to an
Analytics table or to Graph. SUMMARIZE's results, by default, are output to an Analytics table, but it is
possible to output to Screen.
Check out the table below for a detailed comparison of the CLASSIFY and SUMMARIZE commands by feature.
In general, use CLASSIFY if you need to group by just one field and don't need to generate a subtotal. Think of
CLASSIFY as the quick and dirty grouping method. For example, if you're interested in seeing all unique vendor
cities, use the CLASSIFY command and leverage the hyperlinks sent to screen. It's also more common to use
CLASSIFY for quick grouping tasks using the Analytics user interface rather than automating tasks with
CLASSIFY using scripting. Don't worry about scripting now, we'll get to scripting in a later course!
If you need to group by more than one field or need to calculate a subtotal, use SUMMARIZE. SUMMARIZE is
best for scripting because it does more and is less restrictive. Of course, you could use CLASSIFY if you only need
a one-field list of unique values, but that's all you'd get.
| NOTE
Both CLASSIFY and SUMMARIZE return the Count field if results are
sent to an Analytics table. Beware if you summarize twice in a row,
because your first Count field will pertain to the first SUMMARIZE
and Count2 will pertain to the second.
It's worth noting that the SUMMARIZE command can perform certain tasks that the CLASSIFY command cannot:
• SUMMARIZE can be used to identify duplicates. Although a DUPLICATES command does exist
(you'll learn about this command later), using SUMMARIZE to identify duplicates can come in handy. The
SUMMARIZE command can create a group number for matches; you can then use this unique identifier to
filter on and isolate a particular duplicate group. This group number is helpful when we want to isolate one
duplicate set at a time.
| TIP
To use the SUMMARIZE command in this way, run SUMMARIZE on the
duplicate criteria (i.e. Fields A, B, and C), then look for a count greater
than 1. If your criteria returned a count of 2 or more, then more than one
record shared that uniqueness (i.e. a duplicate).
• SUMMARIZE can be used to also create lookup tables. Using the SUMMARIZE command, you can
create a lookup table (a unique list of values) for iteration purposes. You can use the SUMMARIZE
command to dynamically build a list and iterate through that list to build dynamic subsets of data. As you'll
discover in later courses, the SUMMARIZE command can be quite useful when you're building scripts to
automate your work.
1. Can you think of ways you might apply CLASSIFY or SUMMARIZE to your own data analysis?
2. After reading through the power of SUMMARIZE, can using SUMMARIZE to identify duplicate
records or create lookup tables support your own data analysis work?
In this topic, we’ve talked about key differences between the CLASSIFY and SUMMARIZE commands.
1. Key differences between the CLASSIFY and SUMMARIZE commands include the number of fields that
can be grouped, the type of fields that can be grouped, additional command options, and how command
output can be presented.
2. The CLASSIFY command can only be used to group on one field at a time, cannot be used on Datetime
fields, and does not allow you to select additional fields as complementary information. Results are output
to Screen by default and can also be sent to Graph.
3. The SUMMARIZE command can be used to group on multiple fields, and can group Character, Numeric,
or Datetime fields. SUMMARIZE allows you to select additional fields as complementary information,
and includes more statistical outputs. Results are output to an Analytics table by default and cannot be sent
to Graph. The SUMMARIZE command can be useful when scripting.
Cross-tabulating data
The CROSSTAB command helps you summarize large data sets quickly and efficiently by
grouping multiple occurrences of field values. The results of the CROSSTAB command can help
you identify hidden patterns and provide quick summary reporting in an easily digestible data
grouping. Let's see how the CROSSTAB command can be used to group data.
The CROSSTAB command groups multiple occurrences of values in two or more key fields, and counts the number
of records in each group. Key fields can be character or numeric. The resulting groups are displayed in a matrix (a
grid of rows and columns) which allows you to visualize relations and identify patterns in your data.
Cross-tabulating is similar to summarizing using two fields. In both operations, the counts and subtotals in the
output results are the same, but the information is arranged differently in the output results. CROSSTAB also
displays counts and subtotals of zero, which SUMMARIZE does not. Depending on the type of analysis you are
doing, displaying counts and subtotals of zero can be useful. Like SUMMARIZE, you can optionally subtotal one or
more numeric fields when cross-tabulating.
You can cross-tabulate sorted or unsorted tables. When you cross-tabulate an unsorted table Analytics automatically
sorts the output results as part of the cross-tabulate operation.
Source data:
CROSSTAB results:
The CROSSTAB command grouped multiple occurrences of the Product Location and Product
Class key character fields. In this example, Product Location A-01/Product Class 17 had the highest count
at 3 records.
In the example above, you could also subtotal the inventory value field to find the total inventory value for
each product class at each location.
The CROSSTAB command is found in the Analytics menu under Analyze > Cross-tab.
• If you want to count the number of records for each row-column intersection
In the p_Sales2024Actual table, run CROSSTABusing customer number for rows and product class for columns,
and subtotal the amount.
The result shows all the possible combinations including product classes where customers made no purchases.
Review the scenario and answer the multiple choice question below.
You need to check that every sales rep submitted their monthly expense reports and you're given data that contains
expense reports for the past year. You use the CROSSTAB command to generate a matrix
A portion of the cross-tabulated groups are displayed for January and February 2023 in the table below.
In this topic, we’ve talked about how the CROSSTAB command works, how to interpret the output results, and
practiced using the CROSSTAB command in Analytics.
1. The CROSSTAB command groups identical character and number values in two or more key fields, and
counts the number of records in each group.
2. The CROSSTAB command outputs a matrix (a grid of rows and columns) that displays the groups based
on the fields selected for rows and columns. Counts or subtotals of zero are displayed for each group as
applicable.
When working with larger data sets, it may be helpful to categorize the data into smaller groups. The
STRATIFY command in Analytics allows numeric fields to be grouped into intervals to ease data
interpretation. Let's dig into how grouping data using the STRATIFY command can reveal new insights.
The STRATIFY command groups data into numeric intervals (value ranges) based on values in a numeric field and
counts the number of records in each interval.
For example, you could stratify an accounts receivable table on the InvoiceAmount field to group records into
$5000 intervals – invoices from $0 to $4,999.99, from $5,000 to $9,999.99, and so on – and to find the total number
of transactions, and the total transaction amount, for each interval.
When using STRATIFY, you can optionally subtotal one or more associated numeric fields. The Include Statistics
for Subtotal Fields option calculates the average, minimum, and maximum values for each subtotaled numeric
field.
| TIP
If you want to exclude values that exceed specified minimum and
maximum values, use the Suppress Others option.
NOTE
The Suppress Others option is located in the More tab of the Stratify window.
e. Click If... and enter the expression CustomerRegion ="APAC" to limit the results to customers
in the Asia–Pacific region.
In the STRATIFY command output, you'll see the specified intervals, the count of records within each
interval, and the results of any additional options (e.g. subtotal fields or statistics for subtotal fields) applied
to the STRATIFY command.
As with other commands, the results of the STRATIFY command are output to the screen (display area)
within Analytics by default. You may also send STRATIFY command results to a default printer, create a
graph of the results, or send the results to an Analytics table or text file.
1. The STRATIFY command groups numeric data into numeric intervals (value ranges) based on values in a
numeric field and counts the number of records in each interval. Intervals can be equal-sized or custom-
sized. Additional options such as calculating statistics on your data can also be applied to the STRATIFY
command.
2. In the STRATIFY output results, you'll see the specified intervals, the count of the records within an
interval, and the results of any additional options (e.g. subtotal fields or statistics for subtotal fields)
applied to the STRATIFY command.
Grouping data into intervals can help to surface trends and patterns that wouldn't otherwise be
easily identified. You can use the AGE command in Analytics to create intervals based on dates.
This can be useful for surfacing insights such as how many invoices are outstanding for a given
period, which employees have the longest tenure, or which patients have been in the hospital the
longest. Let's get into how the AGE command can be used in Analytics.
The AGE command groups records into aging periods based on values in a date or datetime field, and
counts the number of records in each group. Date ranges are inclusive.
For example, you could age an accounts receivable table on the invoice date field to group records into 30-
day periods - from the cutoff date to 29 days previous, from 30 days previous to 59 days previous, and so
on – and to find the total number of outstanding invoices for each period.
Similar to STRATIFY, the AGE command allows the optional subtotaling of one or more associated
numeric fields as well as the Include Statistics for Subtotal Fields option to calculate average, minimum,
and maximum values for each subtotaled numeric field.
Aging periods are based on date intervals (that is, the number of days) measured backward in time from either:
You could specify a single date interval of 30 days - this creates an aging period that includes any dates 30 days
before the cutoff date, or earlier.
Or you could specify multiple date intervals to create multiple aging periods. For instance, you can specify date
intervals such as 0, 90, and 120 days as starting points for aging periods, or you can accept the default settings of 0,
30, 60, 90, 120, and 10,000 days. An interval of 10,000 days, or an appropriate final interval you specify, can be
used to isolate records with dates that are probably invalid.
Example
You want to group unpaid invoices by the number of days outstanding as of Dec 31, 2023. To group unpaid invoices
into ranges showing how far past the due date they are, run AGE on the InvoiceDate field, set the cutoff date as Dec
31, 2023, and set aging intervals to 0, 30, 60, 90, and 120. Each range will age backward from the cutoff date and
will show the number of invoices within each range along with the invoice totals for each range.
The date intervals created to group the unpaid invoices are listed in the table below.
The AGE command is found in the Analytics menu under Analyze > Age.
Use AGE to determine how many items were or were not reviewed
The situation
Company policy states that all inventory items should be reviewed at least once every six months. The review
considers items such as quantities on hand, sales volumes, and obsolete items.
Your task
Determine how many items were reviewed within the last 30 days and how many have not been reviewed
in over six months using the aging periods 0, 31, 61, 92, 122, 153, and 180. The cutoff date for the analysis
is Dec 31, 2024.
Determine number of items reviewed within in last 30 days and their total value
Discover that four items were reviewed in the last 30 days, totaling $5,341.75.
Determine number items not reviewed in over 180 days and their total
value
In this topic, we’ve talked about how the AGE command works, its output results, and practiced using
AGE in Analytics.
1. The AGE command groups the records in a table into aging periods based on values in a date or datetime
field, and counts the number of records in each aging period. Options such as subtotaling associated
numeric fields or calculating statistics on your data can be applied to the AGE command.
2. In the AGE command output, you'll see the requested aging intervals, the count of the members within
each interval, and the Percent of Count, which calculates the percentage of the total count represented by
each subtotal. You'll also see results for any selected options.
Things we did:
• We used the CLASSIFY command to group multiple occurrences of the same character or
numeric value found within one field.
• We used the SUMMARIZE command to group multiple occurrences of the same character,
numeric, and/or date value found within multiple fields.
• We compared the CLASSIFY and SUMMARIZE commands and showcased scenarios where
SUMMARIZE can be utilized to assist with data analysis activities.
• We learned how the CROSSTAB command can be used to group multiple occurrences of two or
more character values. The resulting groups are displayed in a matrix, which allows you to
visualize relations and patterns in the data.
• We learned that the STRATIFY command groups numeric data into strata, showing how many
items fall within each range and the total value within each range.
• We used the AGE command to group datetime data into ranges based on ages such as 0, 30, 60,
90, and 120 days
Combining data
Analytics allows you to analyze data in one table at a time. Because of this, you may have to combine data
from two or more tables into one table before performing your analysis.
By combining data from different tables, you can conduct comparative studies and analyses.
The APPEND command can quickly combine records from multiple tables, saving you time and reducing
the amount of manual, redundant work required to combine tables. Let's learn more about how the
APPEND command works!
The APPEND command combines records from two or more Analytics tables into a new table by adding
one group of records to the bottom of another group of records (vertically stacking them). The records from
each source table are appended in the order the tables are selected when running the command.
The APPEND command vertically stacks groups of records in different ways depending on whether the
individual fields have identical unique physical names.
Appending identical fields
Source table fields with identical physical names and identical data categories are directly appended. Since the
tables being appended share common field names, the fields in each record will sit under one another and line up
accordingly.
| IMPORTANT
If you have fields in different tables that contain the same data but have
different physical field names, you'll need to standardize the field names in
their respective table layouts before directly appending. For example, if
you have a field containing product names with a physical name of Prod in
one table and Product in another, you must rename Prod to "Product" (or
vice versa) before you can directly append the records.
Example
You work in the Accounting department and want to analyze all of the Accounts Payable (AP) data from
the last quarter of 2022, but you were only given monthly files. To analyze the data, you combine the data
from October, November, and December into one table.
Output table: AP_Trans_Q42022
All three source table fields contain identical physical names and data categories, so APPEND vertically stacks the 3
groups of data under one another.
When tables have common fields but they appear in a different order, APPEND will use the order from the first
table for the output table.
Analytics supports various character, date, and numeric formats. To account for formatting differences in otherwise
identical fields, Analytics can perform automatic harmonization to standardize the data in the output table to one
single format.
Example
Using the previous example, let's assume your organization made a few changes to the report format in November
and December, and you need to append all three reports for your quarterly report. All fields, common and unique,
should appear in the quarterly table.
NOTE
If you chose to include only common fields, the Quantity field would have been omitted from the output
table.
The APPEND command is found in the Analytics menu, under Data > Append.
TIP
Computed fields can't be appended. To append a computed field, you'll need to first extract it to convert the
field to a physical field, then append the extracted table.
After running the APPEND command, the output table only opens automatically in the display area if
the Use Output Table option is selected. The output table may contain blank fields, depending on whether
the tables appended have common or unique name fields.
In the example below, the first table had a few common fields with the appended table, which is why there
are blank fields beside the records from the first table.
Your task
Append the s_InventoryBoston, s_InventoryChicago, s_InventoryDenver,
and s_InventorySanFrancisco tables into one table named p_InventoryAllRegions. Run the TOTAL
command on the combined table to confirm the total quantity of inventory across Boston, Chicago, Denver,
and San Francisco.
| TIP
If you want the output table to only include common fields between the two tables,
select the Common Fields Only option. In this case, the table layout is the same for
both tables, so it won't change the output
Discover that the total inventory on hand across all regions is 169,285, which reconciles to the sum
of Quantity_On_Hand in the four source tables.
| TIP
To reconcile the data, run TOTAL on
the s_InventoryBoston, s_InventoryChicago, s_InventoryDenver, and s_InventorySanFranciscotable
s, then sum those totals and compare the result to the p_InventoryAllRegions total.
1. The APPEND command combines records from two or more Analytics tables into a new table. When
combining the tables, APPEND adds one group of records to the bottom of another group of records.
2. 2
The format of the output table depends on whether the appended tables have unique or common field
names.
3. 3
To append common fields with different data formats, APPEND performs automatic harmonization to
standardize the data in the output table.
The RELATE command enables you to virtually connect up to 18 Analytics tables, even if they have
varying data structures, so that data fields in one table appear in another as if they existed in a single,
cohesive table. To relate tables, you need a common key field that exists in each table. Once you've
identified this common key field, the RELATE command establishes a virtual link (relation) between the
tables, allowing the fields from one table to be easily accessed in the other.
| NOTE
The RELATE command in Analytics is similar to the Relationship function in Microsoft Excel, as both
features connect data from different sources. However, the key difference is that the Relationship function
in Excel doesn’t establish a virtual connection between tables, but retrieves data based on specific criteria.
Example
A risk manager wants to evaluate the impact of changes in interest rates on the credit risk of their mortgage
portfolio. The Customer_Mortgages table contains the customer ID, date, credit score, payment history,
and default status. The Historical_Interest_Rates table contains dates and interest rates.
Since a date field exists in both tables, this can be used as the common key field to virtually relate the two
tables, allowing the risk manager to run their analysis.
Key terms
There are some important terms when using the RELATE command. To better understand these terms,
we'll reference the example above.
| IMPORTANT
Common key fields don't need to have the same name, but they do need to have the same data type and
length. Datetime subtypes (date, datetime, and time) can only be related to the same subtype. For character
types, the justification and case of each key field must be the same.
Using the previous example, consider that the sales manager also requires the sales representative names for their
analysis. The Name field exists in the Salesperson table, but there are no common key fields between
the AR and Salesperson tables.
The RELATE command is found in the Analytics menu under Data > Relate.
In the Add Columns window, you can select the child table from the From Table drop-down menu.
The Available Fields list will update to display the selected table's fields, which can then be added to the
parent table's view.
TIP
Unsure about which fields in your view come from which table? Fields from the child table are named
using the childTableName.fieldName format (e.g., Contract.Sales_Rep_No).
Add the Class_Description field to the p_Sales2024Actual table's view, then filter the data to determine how many
customers purchased garden supplies worth over $1,604.95 in one transaction.
| TIP
Tables in the Relations window can be resized and moved to make it easier to
connect the fields.
Discover that 77 customers purchased garden supplies worth over $1,604.95 in one transaction.
The situation
Your manager has asked you to determine which warehouses are associated with the two customer
numbers in the r_Sales2024BlankInvoices table.
Your task
The only table that contains the warehouse name is the s_ProductClass table. Unfortunately,
the r_Sales2024BlankInvoices and s_ProductClasstables don't share a common field, so you need to use
a third table, s_Sales2024, as an intermediary table. Relate these three tables to determine the warehouses
associated with the two blank invoice items in r_Sales2024BlankInvoices.
| TIP
You can select multiple tables in the Add Table window at the same time by pressing Shift and clicking on
each table name.
Discover that San Francisco and Boston are the two warehouses associated with the blank invoices.
1. The RELATE command virtually links two or more tables together. This allows you to add data fields
from one table to another table's view as if they existed in a single, cohesive table.
2. The initial table that is open when creating a relation between tables is the parent table. The table that you
link to the parent table is the child table. All fields from the child table become available to the parent table
once the virtual link has been made.
3. Linking two tables using a common key field that exists in both the parent and child tables is a direct
relation.
4. To link tables lacking a common key field, you can establish an indirect relation using a third intermediary
table. To do so, you relate the parent and child table using one key field, and relate the child and
grandchild table using a different key field.
To relate two or more tables, the RELATE command uses a many-to-one relation. This means that
when the parent key field contains duplicate values ("many"), each duplicate is only matched to
the first occurrence ("one") of the corresponding child key field value. If there are no duplicates in
the parent table, the RELATE command automatically uses a one-to-one relation (one occurrence
in the parent table is matched to one occurrence in the child table).
NOTE
Because the red values are duplicate values, they won't match to parent key field values and won't
be included in the relation.
D01 doesn't have a match, but it will be included because it's in the parent table. C01 will not be
included because it's in the child table and doesn't have a match.
Example
You want to identify the date of each employee's last bonus. You relate the Employee table (parent) with
the Payroll table (child) that tracks bonus payments.
Since RELATE only matches the first occurrence of the matching secondary key value, the related table only picked
up Bob's bonus of $1,000.00 on 10/10/2023, and the more recent payment of $500.00 on 11/10/2023 was dropped.
One way to fix this issue is to use SORT (not Quick Sort) on the child table to order the Date field in descending
order before relating the two tables. This way, the latest payment date will be matched to the parent table.
| TIP
If you need to pick up both dates, consider using JOIN (using additional
key fields that would produce a unique match). We'll learn about the JOIN
command in the next topic.
IMPORTANT
Make sure to carefully identify your parent and child tables, because results can differ if you reverse the order.
In a many-to-one relation, when the parent key field contains duplicate values ("many"), each duplicate is
only matched to the first occurrence ("one") of the corresponding child key field value.
When analyzing data, you will often need to compare data that is contained in different tables, or match
records in a transaction table with those in a master table. This is where the JOIN command comes in
handy. Using JOIN, you can easily combine two tables to create a new third table, in which you can run
your analysis. Let's learn more!
The JOIN command is similar to the RELATE command in that it combines two tables horizontally using
common key fields between two tables. However, the key distinction between these commands lies in their
outcomes: while RELATE establishes a virtual link between tables, JOIN creates an entirely new,
combined table by merging the selected fields from the original tables.
Example
An auditor is reviewing a company’s financial records to ensure that vendors were not paid twice.
The Vendors table contains vendor names and account numbers, and the Vendor_Payments table contains
payment dates and account numbers.
It would be easier for the auditor to review the data if the Vendor_Payments table included the vendor name. By
joining these two tables using the account number as the common key field, the auditor can create a new table
containing all the required data where they can run their analysis.
Key terms
Before using the JOIN command, take a moment to get familiar with some key terms. We'll reference the above
example to provide a clearer understanding of each term.
IMPORTANT
Just like with the RELATE command, a join will only succeed if the key fields meet certain requirements;
they must be the same data type, the same length, etc.
When you work with joins, you need to consider both matched and unmatched records:
• Matched: Records that have identical values in the primary and secondary key fields. Depending
on the join type you select, duplicate occurrences of matching secondary key values may be left
unjoined.
• Unmatched: Records that don't have identical values in the primary and secondary key fields.
Analytics fills missing fields for unmatched records with blanks or zeros.
When joining tables in Analytics, there are six different join types to consider. See the table below
for an overview of each join type.
Five of the six join types don't include joined duplicate occurrences of the matched secondary key values in
the output table because they are many-to-one joins.
The only join type that includes joined duplicate occurrences of matched secondary key values in the
output table is the Matched primary and secondary (all secondary matches) join type, as this is a many-
to-many join.
We'll cover the six join types, many-to-one joins, and many-to-many joins in more detail in the following
sections.
| NOTE
Analytics uses the term "many-to-many join" in a manner unique to Analytics, it's not the same as a SQL
many-to-many join.
TIP
The JOIN command only matches exact key field values. To join tables using approximate matching you
can use the FUZZYJOIN command, which is found in the Analytics menu under Data > Fuzzy Join
Although JOIN and RELATE both allow you to work with multiple tables in conjunction, typically, one
command is better suited for your purposes than the other. When deciding between the two, consider your
analysis objectives, size of your dataset, and personal preference.
1. JOIN combines two tables to produce an entirely new table. Tables are joined using a common key field.
2. The table that is open when beginning the join is the primary table, and the table it will be combined with
is the secondary table.
3. There are six different join types; five are many-to-one joins and one is a many-to-many join.
4. To find matches, JOIN compares every row of the primary table with every row of the secondary table
based on the criteria specified by the selected join type.
5. The RELATE command establishes a virtual link between tables, but JOIN creates an entirely new table
by merging the selected key fields from the original tables.
The JOIN command is found in the Analytics menu under Data > Join.
IMPORTANT
To select the primary table, open the primary table in the display area before running the JOIN command.
The JOIN command creates a new table in Analytics that contains the selected primary and secondary
fields. Once created, the output table is opened in the display area by default. The output table is saved in
the folder that contains your primary table.
1. Before using the JOIN command in Analytics, its important to determine the primary and secondary tables,
the join type, the common key field(s) between the two tables and the primary and secondary fields to
include in the output table.
2. When in the Join window, always select the Secondary Table first. Only then will Analytics give you
access to the other fields in the Join window.
Before delving into the five many-to-one join types in Analytics, it's crucial to understand how
many-to-one joins work in general. In a many-to-one join, when the primary key field contains
duplicate values ("many"), each duplicate is matched only to the first occurrence ("one") of the
corresponding secondary key field value. If there are duplicates in the secondary table, they won't
be matched
| NOTE
If the primary key field contains unique values ("one"), each value is matched to the first
occurrence ("one") of the corresponding secondary key field value. This is called a one-to-one
join.
Example
Some customers have made multiple purchases, so the Transaction table (primary) contains duplicate customer
numbers. If there are duplicates in the Customer master table (secondary) table, they will never be matched in a
many-to-one join.
| NOTE
The red values will never match to primary key field values because primary values are only joined to the
first matching secondary value.
Although D01 doesn't have a match, it will be included in the joined table because it's in the primary
table. C01 won't be included because it's in the secondary table and doesn't have a match.
| TIP
To include unmatched secondary values in the joined table, use the All primary and secondary
join many-to-one join type.
To include all secondary matches of primary key fields, use the Matched primary and secondary records
(all secondary matches) many-to-many join type; we'll cover this in the next topic.
Let's take a closer look at the five different many-to-one join types that are available in Analytics:
Transactional tables (e.g., a sales table) are likely to contain many duplicate key field values, so these are
best used as primary tables. Master or reference tables (e.g., a vendor master table) are likely to have
unique key field values, so these are best used as secondary tables.
Example
To aid in understanding how each Analytics many-to-many join type works, we'll apply the same example to each
join type to demonstrate:
For this example, we'll join the Payroll table with the Employee records table for different purposes, using each of
the five many-to-many join types in Analytics.
NOTE
In this example, there are no duplicate key fields in the secondary table. Since these are many-to-one joins,
if there were duplicate matching key fields in the secondary table they would be excluded from the output
table.
Scenario: You're running an audit for payroll and want to verify that all employees were paid correctly.
This join type includes matched values from the primary (Payroll) and secondary (Employee records) tables in the
output table. Duplicate occurrences of matched secondary key values are not included.
Since there are no matches for employees 002, 004, or 005, those records are omitted from the output table.
Unmatched primary
Scenario: You want to determine if someone who is not listed as an employee was paid.
This join type only includes unmatched records from the primary table (Payroll) in the output table. All matched
primary and secondary records and unmatched secondary values are omitted from the output table.
The only record in the output table is employee 002.
Scenario: You want to verify that all payroll payments went to employees.
This join type includes matched key values from the primary (Payroll) and secondary (Employee records)
tables, and unmatched values in the primary table. Unmatched values in the secondary table and duplicate
Scenario: You want to verify that all employees listed in the Employee records table were paid.
This join type includes matched key values from the primary (Payroll) and secondary (Employee records) tables,
and unmatched key values in the secondary table. Duplicate occurrences of unmatched secondary key values are
included. Duplicate occurrences of matched secondary key values are included but not joined.
Record 002 in the primary table is omitted from the output table because it is unmatched.
Scenario: You want to verify that all employees listed in the Employee records table were paid.
This join type includes all records in the primary (Payroll) and secondary (Employee records) tables in the output
table, both matched and unmatched. Duplicate occurrences of unmatched secondary key values are included.
Duplicate occurrences of matched secondary key values are included but not joined.
No records from either table are omitted from the output table in our example.
NOTE
Since the All primary and secondary join type is a many-to-one join, records from the primary table are
only matched to the first occurrence in the secondary table.
If there were multiple records of 003 in the secondary table, only the first occurrence of 003in the
secondary table would be matched. The duplicate records would still be included in the output table, but
would show blank values for the missing primary fields.
Join two tables using the Matched primary and secondary (first
secondary match) join type
The Situation
Your manager has requested to know how many customers had an AR balance in both 2023 and 2024.
Your task
Join the s_ArAging2023 and s_ArAging2024 tables using the Matched primary and secondary (first
secondary match) join type to identify customers who had an AR balance in both 2023 and 2024.
1. With the s_ArAging2023 table open, select Data > Join from the Analytics menu.
The Join window opens with the s_ArAging2023 table selected as the primary table.
2. In the Secondary Table dropdown, select the s_ArAging2024 table.
The s_ArAging2024 tables fields display in the Secondary Keysand Secondary Fields lists.
3. Under Join Types, select Matched primary and secondary (first secondary match).
4. In the Primary Keys list, select the Cust_No field.
The Cust_No field is highlighted in the Primary Keys list.
5. In the Secondary Keys list, select the Cust_No field.
The Cust_No field is highlighted in the Secondary Keys list.
6. Select the primary fields:
1. Click Primary Fields... to open the Primary Fields window.
2. Click Add All to move all the fields to the Selected Fields list, then click OK.
The Primary Fields window closes and the fields are highlighted in the Primary Fields list.
7. Select the secondary fields:
1. Click Secondary Fields... to open the Secondary Fieldswindow.
2. Click Add All to move all the fields to the Selected Fields list, then click OK.
The Secondary Fields window closes and the fields are highlighted in the Secondary Fields list.
8. Select the Presort Primary Table, Presort Secondary Table, and Use Output Table options.
9. In the To... box, enter "r_ArBalanceBothYears" as the output table name.
10. Click OK.
The new r_ArBalanceBothYears table opens in the display area.
Join two tables using the Matched primary and secondary (first
secondary match) join type
The Situation
Your manager has requested to know how many customers had an AR balance in both 2023 and 2024.
Your task
Join the s_ArAging2023 and s_ArAging2024 tables using the Matched primary and secondary (first secondary
match) join type to identify customers who had an AR balance in both 2023 and 2024.
Your task
Join the s_ArAging2023 and s_ArAging2024 tables using the Unmatched primary join type to identify customers
who only had an AR balance in 2023.
1. With the s_ArAging2023 table open, select Data > Join from the Analytics menu.
The Join window opens with the s_ArAging2023 table selected as the primary table.
2. In the Secondary Table dropdown, select the s_ArAging2024 table.
The s_ArAging2024 tables fields display in the Secondary Keysand Secondary Fields lists.
3. Under Join Types, select Unmatched primary.
4. In the Primary Keys list, select the Cust_No field.
The Cust_No field is highlighted in the Primary Keys list.
5. In the Secondary Keys list, select the Cust_No field.
The Cust_No field is highlighted in the Secondary Keys list.
6. Select the primary fields:
1. Click Primary Fields... to open the Primary Fields window.
2. Click Add All to move all the fields to the Selected Fields list, then click OK.
The Primary Fields window closes and all fields are highlighted in the Primary Fields list.
7. Select the secondary fields:
1. Click Secondary Fields... to open the Secondary Fieldswindow.
2. Click Add All to move all the fields to the Selected Fields list, then click OK.
The Secondary Fields window closes and all fields are highlighted in the Secondary Fields list.
8. Select the Presort Primary Table, Presort Secondary Table, and Use Output Table options.
9. In the To... box, enter "r_ArBalance2023Only" as the output table name.
10. Click OK.
The new r_ArBalance2023Only table opens in the display area.
TIP
If the number of records isn't displaying in the Analytics status bar, run COUNT on r_ArBalance2023Only.
1. In a many-to-one join, when the primary key field contains duplicate values ("many"), each duplicate is
matched only to the first occurrence ("one") of the corresponding secondary key field value. Duplicate
secondary keys aren't matched.
2. The five many-to-one join types are Matched primary and secondary (first secondary
match), Unmatched primary, All primary and matched secondary, All secondary and matched
primary, and All primary and secondary.
3. Duplicate unmatched secondary keys may be included in the output depending on the join type used.
Analytics fills missing primary fields for unmatched secondary records with blanks or zeros.
Unlike many-to-one joins, the many-to-many join matches and combines all duplicate occurrences
("many") of the matching secondary key fields to the matching duplicate ("many") primary key fields. The
only many-to-many join type in Analytics is Matched primary and secondary (all secondary matches).
The main difference between many-to-one joins and many-to-many joins is that duplicate matching
secondary key values are joined in many-to-many joins, but in many-to-one joins they are not.
Example
Some vendors have used a service multiple times, so the Vendor Accounts table (primary) contains duplicate
vendors. The Accounts Receivable table (secondary) contains duplicate numbers that we want to include in the
output table. Using the Matched primary and secondary (all secondary matches) join type, all the duplicate
matching values in the secondary table are joined and included in the output table.
As previously mentioned, the Matched primary and secondary (all secondary matches) join type is the
only many-to-many join in Analytics. When using this join type, any unmatched key fields in the primary
or secondary table will be omitted from the output table.
To better understand how this join type works, let's look at an example.
TIP
A good time to use the many-to-many join is if you are unsure whether duplicate matches exist in the
secondary table. This ensures you don't exclude any records that should be joined.
NOTE
The order of the tables do not matter as much when using the Matched primary and secondary (all
secondary matches) join type, as records from the secondary table are joined and included in the output
table.
Matched primary and secondary
This join type includes matching values in the primary (Payroll) and secondary (Employee records) tables in the
output table. All duplicate matching occurrences in the secondary table will be joined and included in the output
table. Unmatched primary and secondary records will be excluded from the output table.
Since employee 006 appears twice in the primary table and twice in the secondary table, there are four records for
Joining tables using the Matched primary and secondary (all secondary
matches) join type
The situation
Your manager has asked you to determine how many customers ordered 76 or more houseware products from the
Chicago warehouse.
Your task
To complete this task, join the s_Sales2024 and s_ProductClass tables. Name the output
table r_JoinSales2024_ProductClass.
Next, filter the Product_Class field to only show the product class 01 (houseware products) and add a filter to
the Quantity Sold field to show quantities equivalent to 76 or more sold. Finally, add a filter to
the Warehouse column to only show warehouses from Chicago.
1. With the s_Sales2024 table open, select Data > Join from the Analytics menu.
The Join window opens.
2. From the Secondary Table dropdown menu, select the s_ProductClass table.
The s_ProductClass fields are displayed in the Secondary Keysand Secondary Fields lists.
3. Under Join Types, select Matched primary and secondary (all secondary matches).
4. In the Primary Keys list, select the Product_Class field.
5. In the Secondary Keys list, select the Product_Class field.
6. In the Primary Fields list, select
the Customer_Number, Invoice_Date, Product_Class, and Quantity_Sold fields.
7. In the Secondary Fields list, select the Class_Description and Warehouse fields.
8. Select the Presort Primary Table, Presort Secondary Table, and Use Output Table options.
9. In the To... box, enter "r_JoinSales2024_ProductClass" as the output table name.
10. Click OK.
The new r_JoinSales2024_ProductClass table opens in the display area.
The r_JoinSales2024_ProductClass table is filtered to only display product class 01 data ("Housewares").
The r_JoinSales2024_ProductClass table is filtered to display records of housewares products sold out of the
Chicago warehouse.
1. Navigate to the Quantity Sold column and click on any cell containing "76" to highlight it.
2. Right-click on the highlighted cell and select Quick Filter > AND > Greater Than Or Equal To.
The filter bar displays the active filter ((Product_Class = "01") AND (Warehouse = "Chicago")) AND
(Quantity_Sold >= 76) .
Discover that there are 12 records of customers who bought over 76 items of houseware products from the Chicago
warehouse.
1. Unlike the many-to-one join, the many-to-many join matches and joins all duplicate occurrences ("many")
of the matching secondary key fields to the matching duplicate ("many") primary key fields. Unmatched
records are not included in the output table.
2. The only many-to-many join type in Analytics is Matched primary and secondary (all secondary
matches).
Quick Sort
Recall from Basics of Analytics (ACL 101) that Quick Sort is an easy way to visually inspect the highest or lowest
records. It temporarily reorders records in the view by sorting records on a single field.
To apply Quick Sort, click on the column heading of the field to be ordered to highlight it, then right-click and
To turn Quick Sort off, right-click on the sorted data and select Quick Sort Off.
• The SORT command sorts records into ascending or descending sequential order of one or more
fields. Unlike Quick Sort, which temporarily reorders records in the existing table, the SORT
command permanently reorders records by producing a new, sorted table.
• If you sort on a single key field, records within each sorted group retain their original sort order
relative to one another.
• If you sort on multiple key fields, the order in which you select the fields determines the sort order
priority. For multiple key fields, the records are sorted by the first field you select. Then, if there
are multiple occurrences of a value in the first field, the records within the group are sorted by the
second field you select, and so on.
The SORT command is found in the Analytics menu under Data > Sort.
When using SORT, you'll need to determine if you want to include the entire data record or only specified
fields in your table output. There are important implications associated with each option, described in the
table below.
The SORT command creates a new, reordered Analytics table that includes either the records or specific
fields you specified when running the command.
The image below shows an example output table when sorting data on customer number and payment date
fields in ascending order, and selecting the CustNum, PmtDate, and PaidAmt fields as the table output
fields
The only output option for the SORT command is a new Analytics table, which is most commonly saved to
the Analytics project folder. However, it is possible to save the output table in a location other than the
project folder by specifying an absolute file path ( e.g. C:\Results\Output.fil ) or a relative file path (e.g.
Results\Output.fil). Regardless of where you save the output table, a .FIL file is created and added to the
open project.
Your task
Sort the inventory records by Product Number and create a new inventory table that the audit team can use for
their inventory analysis
2. If you sort on multiple key fields, the order in which you select the fields determines the sort order priority.
Indexing using the INDEX command in Analytics creates a separate index (.INX) file that allows access to
the records in an Analytics table in a sequential order rather than the physical order. An index file is
approximately 5% the size of the source file, which may make indexing a better option for sorting records
in some scenarios.
Indexing does not physically reorder data in tables. However, when an index is active, the data in the view
is sorted in the order specified by the index, and analytical operations process the data based on this order.
If a table has more than one view, all views are subject to an active index.
When an index is active, the word "Indexed" prefaces the record count in the status bar. When the index is
inactive, the records in a view revert to their original physical order. Upon opening an Analytics table, any
existing indexes are inactive by default.
As mentioned previously, the INDEX command creates a separate index (.INX) file. This means that there is no
new output table created and no data to send to Screen when working with the INDEX command.
2. Indexing does not change the physical order of data, whereas using the SORT command does. The
resulting indexed file size is smaller and requires less disk space than using the SORT command.
3. Indexing uses fewer resources and is quicker to search character fields, so using INDEX can be better than
using SORT in some cases. In instances where subsequent data analysis is required after sorting, using the
SORT command may optimize your data analysis.
1. Quick Sort is best for temporarily sorting by one field and is useful for quick visual inspections of data.
2. The SORT command physically reorders data based on single or multiple fields and outputs results to a
new Analytics table.
4. The INDEX command reorders single or multiple fields using a small index (.INX) file to access records,
allowing the data in the view to be reordered without creating a new table or physically reordering the
records.
1. Grouping data using the CLASSIFY, SUMMARIZE, CROSSTAB, STRATIFY, and AGE
commands. Use CLASSIFY and SUMMARIZE to group multiple occurrences of the same
character or numeric value in one or multiple fields, respectively. CROSSTAB is used to group
multiple occurrences of two or more character values, with the results displayed in a matrix. To
group numeric data into equal or custom ranges, use STRATIFY. Use AGE to group datetime
data into aging periods based on a cutoff date or the current system date.
2. Combining data using the APPEND, RELATE, and JOIN commands. Since Analytics can
only analyze one table at a time, you may need to combine tables to proceed with your analysis.
Tables can be combined vertically using APPEND by adding one table to the bottom of another
table. You can create a virtual link between tables using RELATE, which makes the fields in one
table available in another. You can combine two tables horizontally using JOIN to create a new
third table. There are six different join types, which can drastically change the output of the join
operation.
3. Reordering data using the SORT and INDEX commands. The SORT command permanently
reorders data using one or more fields, and outputs the results to a new table that can be used for
further analysis. Using INDEX, you can create multiple index files (.INX) that can be applied to
the same table to support different analyses. Activating an index reorders the data without
creating a new table or physically reordering records. To temporarily reorder records for a quick
visual inspection of the data, use Quick Sort.