I think we can all agree, Tableau looks great in a demo with the sample data stored in Excel. But the true test of a self-service analytics tool is working with real data that may reside in more than a single Excel workbook. Thankfully, Tableau has two different methods of combining data to give the user flexibility: data joining and data blending. I want to address a few potential misunderstandings and highlight a few caveats to keep in mind when using either method.
Data Joining
When you want to work with multiple tables or views within a source (e.g., tables from the same database, sheets from an Excel workbook, text files within the same directory), Tableau works well when you join them. In most cases, performance will be much better than if you use blending.Using joins to combine data is easy to do and performs well in Tableau when the data resides in the same source.
In Tableau, data joining is easy because you don’t need to know SQL. Simply drag and drop to integrate multiple tables and specify the columns to join on – choose the Data > Connect to Data menu option to get started. After connecting to the data source, drop a table onto the blank surface, followed by the table to be joined. Click on the set of interlocking circles which appears between the tables to open the Join window, where you select whether you want to use an inner, left, right, or full outer join. Then add the columns you want to join on.
Learn about each join type in this excellent Tableau support resource. (click image to view full size) |
Data Blending
(Note: I won’t cover the basics of blending here; Tableau provides helpful resources to get you started.)Two potential limitations: First, because data blending happens at time of query, after you’ve made data connections, performance can be slow, but in many cases this is still the best option. Second, you can’t blend two cubes.
Fortunately, you can blend a single cube with other data sources, and there are many other data sources that you can blend. To start, you can combine data from:
- two different databases (same or different servers),
- a database and an Excel file,
- a cube and a database or Excel file, or
- multiple files.
Blending in Tableau requires you to designate one data source as Primary, the others as Secondary. You aren’t limited to two data sources – you can blend several – but only one can be Primary.
So what does the Primary designation mean? To those familiar with SQL queries, the primary source is like the “left” or first table in a left outer join. After blending, the results will include all rows in the Primary table but only those rows in the Secondary that match on at least one column you specify. If the Secondary source doesn’t have a match in the Primary source, it won’t be included in the results.
Say you want to blend sales amounts from orders in your CRM or ERP system with returns against each order stored in another system. Consider these two tables coming from different data sources:
Blend the Orders table with the Returns table, and the results will contain all orders for Xerox products but only include records from the Returns table with a matching Order ID (boxed in red), the only column these two tables have in common. This is expected “left join” behavior.
Tableau’s options for combining data are easy to use.
Joining and blending enable business users to query different data sources where they reside, rather than waiting for IT to replicate and centralize data through traditional development methods. While developers may not see Tableau as a robust dedicated integration tool like Microsoft’s SQL Server Integration Services or Alteryx, and technical analysts who use Excel’s Power Query may find Tableau less flexible, it does provide the easy-to-use graphical interfaces and high-performing capabilities that enable many business users to get the data they need.
NEXT WEEK: Tips and tricks for working with Primary and Secondary data sources.