Getting Creative with Big Data and Google Apps
Big data isn’t just for those who can wrangle Hadoop or wrestle with R. Service interfaces (APIs) make it easy for even the relative novice explorer some creative solutions. Working for an innovation centre I often find the need to quickly glue together service APIs to gain some insight from the data flowing around our sector. More often than not I turn to Google Sheets (Spreadsheets) and Google Apps Script to rapidly develop dashboards I can share with colleagues. It’s reassuring to see I’m not the only one who uses this approach. A recent post on the Google Apps Developer Blog highlighting how the UK Government’s Cabinet Office do Google Analytics reporting with Google Apps Script.
A great thing about Google Sheets/Apps Script is you are not limited to one API at a time. For example, lets again take Google Analytics data and focus on referral traffic. In Google Analytics you now have access to some social activity data, but this is limited to Social Data Hub partners, which – perhaps unsurprisingly – doesn’t include Twitter.
But all is not lost. Since August 2011 Twitter has started automatically wrapping links in its own shortening service t.co. So regardless of any other services you use (like Bitly) the end user will be hitting a t.co link. Let’s see how this works in practice. When I fire up my Google Analytics account and look at referral traffic I can see it’s dominated by t.co sources. Drilling down into the t.co data I can actually see how many visits each t.co link generated.
Another important fact is that each t.co link is unique even if it’s pointing at the same destination. Searching Twitter for say http://t.co/wEbXrPah allows me to trace it back to this tweet:
#HowTo show a #cool interactive visualization of event hashtags: bit.ly/rtoVa9
— TweetSmarter (@TweetSmarter) February 23, 2012
So we can say this tweet and it subsequent 5 retweets generated 42 visits to my blog. At this point you might be saying but this tweet above has a bit.ly link, there’s no reference to t.co. It may say bit.ly but underneath the hyperlink is t.co:
As the Cabinet Office guest post showed it’s easy to define a Google Analytics query. In this case I’m extracting just the t.co referral visits:
var params = { 'ids': 'ga:'+ids, // Profile id 'dimensions': 'ga:hostname,ga:pagePath,ga:pageTitle,ga:referralPath', // Comma separated list of dimensions. 'metrics': 'ga:newVisits,ga:visitors,ga:visits', // Comma separated list of metrics 'sort': '-ga:visits', // Sort by visits descending. 'segment': 'gaid::-8', // Id of refferal segment 'filters': 'ga:source==t.co', // Display only t.co sources. 'start-index': '1', 'start-date': startDate, // Start-date (format yyyy-MM-dd). 'end-date': endDate, // End-date (format yyyy-MM-dd). 'max-results': maxResult // Rumber of results };
Which in my case I pull into a sheet as a custom formula:
Next I write another custom formula which takes values from the t.co Link column and searches for the person who first tweeted it and subsequent retweets.
Adding a sprinkling of other spreadsheet formula like SPARKLINE and IMAGE and throwing in a Motion Chart we get:
Now at a glance I can see which individuals are generating the most click-throughs, the comparative size of the number of followers they have, new versus return visits and more. It’s important to remember we are extracting visitor counts for all links tracked by Google Analytics regardless of any other service that was used to shorten them. Suddenly Google Analytics data has got very personal.
If you’d like to try yourself or see the code behind this here’s a copy of the Google Analytics + Twitter Referral Spreadsheet.
Bio: Martin Hawksey is an advisor at the Centre for Educational Technology and Interoperability Standards (CETIS), a national advisory and innovation centre supporting the UK Higher and Post-16 Education sectors on educational technology and standards. He regularly blogs at MASHe and can be found on Twitter @mhawksey.