Tuesday, April 7, 2015

Tuesdays with Tableau: Tips and Tricks for Combining Data through Blending

By Gail Kluepfel
 

Last week I covered how Tableau enables you to combine data from different sources directly from your desktop interface. Tableau calls the technique data blending, in contrast with data joining, used to combine data from the same source when establishing the data connection. Today, I want to share some tips and tricks that aren’t obvious when blending data for the first time.

To create the blend in Tableau, you first create data connections for two or more sources. Once again, I’ll use the Orders and Returns tables from the Tableau Superstore data sample as examples of two different data sources. When you set a data source as Primary, you ensure that all rows from that source will be included in your view along with any rows in the Secondary source(s) that match on a common field; rows that don’t match in the Primary source won’t appear in the results.

There are two ways to set a data source as Primary – one is automatic, and the other is manual.

Tableau relies on subtle visual cues to indicate when a source is automatically set as Primary. Because these indicators aren’t immediately obvious in the UI, let me point them out here; not knowing which source is which could lead to unexpected results.

The Automatic Way

Here’s what you’ll see in the side bar of the worksheet view listed under Data after connecting to one or more data sources:
Two data sources (Orders and Returns)
As you can see, both icons are white. With a data source highlighted, click on a data element in the Dimensions or Measures windows, and the icon for that data source will show a checkmark in blue. Whichever data element you click and drag into your view first automatically becomes the Primary source.
Orders set as Primary
When you click a different data source, and select another data element, that source automatically becomes Secondary, and is indicated by a checkmark in orange:
Returns set as Secondary
Notice that there is also an orange link icon to the right of Order ID. Because this column has the same name as a column in the Primary data set, Tableau automatically creates a relationship between them, and “blends” the data sets through a left join from the primary to secondary table on the linked columns.

The Manual Way

The clearest way to see how Tableau is blending your data is to open the Relationships dialog box. In the top menu, select Data > Edit Relationships.
Relationships Dialog Box
(click image for full size)
Change which source is Primary or Secondary by selecting it from the drop-down menu. In this view, you can also add or remove columns to join the sources on, if the automatically-assigned ones aren’t sufficient.

Make Connections Active

After you have blending the sources using Relationships, be check that all of the columns you selected to join on are active in the data window for the Secondary data source. If the link icon is grey, then the join is inactive. Click on it to make it active.
Inactive Link and Active Link
The link icon will turn orange and a checkmark appears next to the Secondary source. Seeing “red” in this case is a good thing – it represents an active blend.

Adding More Data Sources

To add a third data source to the blend, use the same techniques. Here I’ve blended some Target data with the Orders data (Primary), using Month-Year of Order Date and Product Name to join the two sources, as seen in the Data > Edit Relationships box.
Relationships Dialog Box
(click image for full size)
Once blended, I can drag and drop data points from all three sources into my view, and even create calculated fields that use measures from one, two, or all three data sources. I can quickly see which products have had returned orders, the number of returns, and the sales amount for each, and whether or not there were targets associated with the products.
(click image for full size)
I’m using some dimensions from the Order table, some from the Returns table, and measures from all three. As long as Primary and Secondary sources share at least one value in at least one column, you’ll be able to use dimensions or measures from any of the sources.

When deciding which source to make Primary, first decide from which data set you need all the records in your results, even if there isn’t a match in another data set. While Tableau’s data blending capabilities may not offer the same flexibility as custom joins in SQL, it certainly meets the needs for business users who want to pull together data sources quickly and easily.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.