Friday 15 May 2015

Creating Word Cloud with Tableau


Here we will look at how to create word cloud systematically with the help of Tableau. One should understand what word cloud is and when it is typically used before getting into “How-To” part.

The Wikipedia definition of word cloud (a.k.a tag cloud) states that word cloud is a visual representation for text data typically used to depict keyword metadata (tags) on websites, or to visualize free form text.” One can refer to the article (and various others on the Internet) to understand more details about word cloud.

Example: The image below shows a sample word cloud of 100 most used passwords. One can easily interpret that “123456” is most used password as represented by its size followed by “password” followed by“12345678” and so on.





Sourcemetro.co.uk
This article on bbc.com analyses Mr. Narendra Modi’s speech as a PM Candidate and as a PM. The image below is sourced from the same article, which depicts Mr.Modi’s words as Prime Minister.





Sourcebbc.com


Data

The data has been sourced from howstat.com and formatted appropriately for Tableau’s consumption. This is the first, most important and often time-consuming step before data visualization and exploration can happen. We have batting data for One Day International (ODI) matches played between years 1971 to 2011 with close to 60,000 data points. The below table gives you a quick overview of important dimensions and measures present in the dataset.
DimensionsMeasures
CountryRuns
Player nameScore Rate (runs per 100 balls faced)
Opponent country
Ground
Match Date


Data Exploration & Visualization

Let us begin.  Who has scored more than 1000 runs against India?
Let us first conceptualize what we are trying to visualize and construct a series of steps to achieve the same.

We need to create a word cloud of Player names of various Countries that have scored equal or more than 1000 Runsversus India.

Note:The words in bold correspond to dimensions or measures we already have in our data.

Step 1: Connect to Data
data collection for creating word cloud in tableau

Step 2: Go to Worksheet

uploading data in Tableau












Step 3: Setup a filter. In our case, the filter would be Versus = India

applying filters

Step 4: Drag Player on to Label

using labels in tableau














Step 5: Drag Runs (by default Sum is chosen as aggregation method) on to Size

setting up the parameters to create word cloud in tableau













Step 6: Put a filter on Runs for criteria "at least 1000"

filters for creating word cloud














Step 7: Choose Marks as Text instead of Automatic. This is the key to creating a Word Cloud in any example that you build.

Selecting the appropriate marks














Step 8: Drag Country on to Color.

using colors for word cloud














Word Cloud is ready. One can observe that Sanath Jayasuriya has scored the most number of runs against India followed by Inzamam and Ricky Ponting. In general, Sri Lankans, Australians and Pakistani batsmen have scored heavily against India. The reason is these four countries have played most ODI matches and have played very frequently against each other.
Surprising none of the England Batsmen feature in the visualization and three of the Zimbabwean batsmen appear in the list.

Final word cloud in Tableau














Here is the count of matches played by these countries against India.

Match stats

Using Word Cloud for above analysis is certainly not right, tree map or bar chart is the best fit. As one would still be required to understand how much runs scored or how many number of matches are played by those players against India. The take away from this blog is how to create Word Cloud with Tableau. The best scenario for using word cloud is to analyse textual data, their frequency of occurrence. That said, one should be cautious, as Word Cloud emphasize on frequency of the word not necessarily their importance. In addition, they do not provide the context in which those words are used so again Word Clouds are good way to do some quick exploratory analysis of text.

Referring from: edupristine.com

1 comment: