Combining Data (1)
Combining Data (1)
Coarse granularity: "coarser" means a lower level of detail in the data It refers to
data that has been more aggregated or summarized.
Example: a dataset that only shows the total daily sales for each store location. This data
has been aggregated from the individual transactions and has a lower level of detail; it is
therefore of coarser granularity
Relationships created in the logical layer
represent different types of cardinality:
• One-to-One Relationships:
• These occur when a single record in one table corresponds to at most one record in another table.
While the logical layer can handle this, the sources suggest that traditional joins in the physical
layer might also be suitable for this scenario, especially when the data is at the same level of detail.
• One-to-Many Relationships:
• This is when one record in one table can be related to multiple records in another table. A common
example is the relationship between customers and their orders (one customer can have many
orders). The logical layer is well-suited to manage these kinds of relationships.
• Many-to-Many Relationships:
• These are situations where multiple records in one table can be related to multiple records in
another table.
joins, unions, and relationships
• You need joins, unions, and relationships for different ways of
combining data from one or more sources to facilitate analysis.
Joins
• Joins combines Fields or columns of two tables.
• You need joins to combine data from multiple tables by adding columns. This is
typically done when the tables share one or more common fields (keys).
Example, you might join a table of sales orders with a table of customer details
using a customer ID to bring the customer's address information into your sales
analysis.
• Tableau supports different types of joins like inner, left, right, and full outer joins,
each determining which rows are included in the combined dataset based on the
matching keys.
• Joins are particularly useful for data with one-to-one or one-to-many
relationships. However, if the data is at different levels of detail, joins in the
physical layer can sometimes lead to data duplication
Types of Joins:
• Inner Join: This is the default join type and retains only the data where there is a
match in the specified join clause across both data sources. The Venn diagram for
an inner join shows only the overlapping section coloured.
• Left Join: This join retains all the data from the left-hand side data source and
brings in any matching records from the right-hand side data source. If there is no
match in the right-hand data source, the columns from that source will have null
values. The Venn diagram shows the left circle and the overlapping section
coloured.
• Right Join: This is the opposite of a left join. It retains all the data from the right-
hand side data source and brings in any matching records from the left-hand side
data source. Null values will appear for columns from the left-hand side where
there is no match. The Venn diagram shows the right circle and the overlapping
section coloured.
• Full Outer Join: This join retains all data from both data sources, combining
records where the join clause is met and filling in nulls where there is no match.
The Venn diagram shows both circles fully coloured
Types of Joins
Inner Join
Left Table and Right Table
Output table shows only the matching rows from the left and right table. Any
un-matching row will not be in output table
CustomerAg
CustomerNames
ID Name
es
ID Age
1 Ali 1 20
2 Hassan 3 35
ID Name ID Age
1 Ali 1 20
Left Join
Left Table and Right Table
Output table shows All rows from left table but only the matching rows from
the right table.
CustomerNa CustomerAg
mes
ID Name
es
ID Age
1 Ali 1 20
2 Hassan 3 35
ID Name ID Age
1 Ali 1 20
2 Hassan NULL NULL
Right Join
Left Table and Right Table
Output table shows only the matching rows from the left table but all rows
from right table.
CustomerAg
CustomerNames
ID Name
es
ID Age
1 Ali 1 20
2 Hassan 3 35
ID Name ID Age
1 Ali 1 20
NULL NULL 3 35
Full Join
Left Table and Right Table
Output table shows all the rows from the left table and all the rows from right
table.
CustomerAg
CustomerNames
ID Name
es
ID Age
1 Ali 1 20
2 Hassan 3 35
ID Name ID Age
1 Ali 1 20
2 Hassan NULL NULL
NULL NULL 3 35
Creating Joins in Tableau Desktop
1. Open the Data Source tab.
2. Ensure the desired data sources are added. Note that Tableau Data sources (.tds or .tdsx) cannot be used in
joins.
3. Drag the main table onto the canvas.
4. Double-click the dragged table to enter the join/union interface (physical layer).
5. Drag the table you want to join with onto the canvas next to the first table. A connecting line with a Venn
diagram will appear.
6. Click the Venn diagram to open the join configuration window.
7. Select the desired join type (Inner, Left, Right, Full Outer) by clicking on the corresponding Venn diagram icon.
8. Define the join clause by selecting the fields that should match between the two data sources. You can select
common fields with the same name, or choose different fields. You can add multiple join clauses using the "+"
icon.
9. Ensure that the data types of the fields used in the join clause are the same.
10. You can also create complex join conditions using calculation logic, for example, joining on date ranges or
concatenating fields.
Considerations for Joining Data:
• Granularity: Be mindful of the level of detail in your data sources. Joining data
at different levels of aggregation can lead to duplication of records and
inaccurate aggregations. Consider aggregating data before joining if necessary.
• Primary and Foreign Keys: Joins often rely on primary keys (unique identifiers
for records in a table) and foreign keys (fields in one table that reference the
primary key in another). Joining on non-unique keys can lead to erroneous
results.
• Join Culling: Tableau can optimise queries by using join culling, where it only
includes joined tables that are specifically referenced by fields in the view,
assuming referential integrity.
• Spatial Joins: These are used specifically when your data contains spatial
information to join based on spatial relationships.
Join Culling
• Join culling is a performance optimisation technique used by Tableau
when querying databases with joins. It works by generating SQL
queries that only include the tables necessary to retrieve the data
required for the current view.
• How it works: When you build a visualisation in Tableau using data
from joined tables, Tableau analyses which fields from which tables
are actually needed to render the view. If a joined table does not
contain any of the fields required for the visualisation, Tableau's query
engine will exclude (or "cull") that table from the generated SQL
query.
Join Culling
• Purpose: The primary goal of join culling is to improve query
performance by reducing the complexity of the SQL queries sent to
the database and the amount of data that needs to be processed and
transferred. By only querying necessary tables, Tableau can execute
queries faster and more efficiently.
• Conditions for effectiveness: Join culling assumes that the tables in
the database have referential integrity. This means that if you join a
fact table to a dimension table on a common key, the information in
that dimension table is consistent and reliable. In such cases, if a
query only needs data from the fact table (or a subset of joined
dimension tables), the other joined tables will not be referenced.
Spatial Joins
• spatial joins provide a powerful way to integrate and analyse data
based on geographic relationships, allowing you to go beyond
traditional attribute-based joins and leverage the spatial context of
your data.
Unions
We use unions to combine data by appending rows from two or more
tables or files that have the same structure and set of fields. Tables
should have same number of columns and same data types. Unlike
Joins, we don’t need key to combine data.
For Example, if you have sales data for different years stored in
separate Excel sheets with identical columns, you would use a union to
stack this data on top of each other, creating a single table with all the
historical sales records. Unions are essential for analysing data that is
split across multiple sources but represents the same type of
information.
How Union Works
ID Date
1 2022
2 2022
3 2023
4 2023