Tuesday, April 28, 2015

Tuesdays with Tableau: Is your data ready?

By Gail Kluepfel

One reason today’s business intelligence applications are considered “self-service” is that business users can connect to a wide range of data sources with a few clicks – from Hadoop or SQL server, to text files, to the work horse of BI, Excel spreadsheets. Great! You no longer need a Computer Science degree to join together disparate data sources! However, regardless of your data source, self-service BI tools still assume that the data is in a consistent structure, and it is easy to spend a lot of time making it so.

Most self-service BI tools want data in a table like this:
Clean, neat data is always preferred.
(click on image to view full size)
What do you do if your data source looks more like this?
Data is not always formatted the way you need.
(click on image to view full size)
While the data source here is Excel, the structure is not a table with consistent data types and level of detail on every row. Some columns are only populated with data on certain rows, and some rows contain sub-totals. Within columns, values for customer names are combined with order numbers. 

One of my client’s only source of data was in this format, which wasn’t surprising since he was a top executive at a small company, and his outsourced accounting firm only provided him with nicely-formatted summary data each month. He wanted to use Tableau to analyze and trend data for the last several years, and was about to give up because he wanted to avoid manual cleanup of 30 or 40 individual spreadsheets from prior years. 

Thankfully, Tableau’s latest release (version 9.0) includes several new ways to automate your data preparation. For example, when you connect to an Excel spreadsheet that has multiple header rows, the Tableau Data Interpreter will detect the location and structure of the data and ensure it’s ready for analysis in the Tableau workspace. The new features also enable you to split data elements that are combined in a single column, and to pivot from multiple to single columns, or vice versa. For a complete list of data prep capabilities, check out the 9.0 features.

Still, there are cases where Tableau can’t help, such as auto-populating rows with data from a previous row. In this example, the column header “Customer Name” needs to be moved one column to the right, and the name of the customer needs to be populated down all the rows containing her Order IDs. 
Sometimes rows need to be populated, and Alteryx can help.
(click on image to view full size)
For this and even more robust cleanup, there are some great self-service data preparation tools that enable business users to create automated workflows to handle even the most gnarly data challenges. Alteryx in particular has an intuitive drag-and-drop interface. I recently downloaded the trial and, within an hour, set up several steps to crunch multiple Excel workbooks. Here is what the workflow looks like in Alteryx: 
Alteryx makes it easy to clean your data.
(click on image to view full size)
Each icon is a tool that you drag and drop onto the workflow surface and then click to connect. Once set up, you can schedule the flow to run at regular intervals, whereas previously you might have had to manually copy and paste data multiple ways and multiple times, or create macros and manually run them. With these and more data cleanup features baked into BI applications or add-on tools, self-service has never been easier. Check it out for yourself, by downloading the Alteryx trial. If you need a little guidance getting started, contact Marquis Leadership, and we’ll set you up for automated data cleanup success!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.