The Big Data Scientist Hunt
According to a survey of senior executives by Big Data consultancy NewVantage, Big Data is “top of mind for leading industry executives,” but these same executives struggle to find the right people to analyse their data. In fact, while 70% of those organisations surveyed plan to hire data scientists, 100 per cent of them said they find it at least “somewhat challenging” to hire competent data scientists (see image).
Given the difficulty in finding qualified people to analyse data, it’s perhaps not surprising that only 0.5 per cent of enterprise data gets analysed, according to IDC. Ironically, more data might get analysed if enterprises would stop searching outside their organisations for talent.
That’s right: most organisations already have people that can crunch their data, as Gartner analyst Svetlana Sicular posits:
“Organisations already have people who know their own data better than mystical data scientists….Learning Hadoop is easier than learning the company’s business.”
So the focus of enterprises should be training employees to use tools like Hadoop, not to waste cycles and recruiting fees scouring the planet for mythical data scientists.
Big Data superstar Nate Silver argues much the same thing in his book, The Signal and the Noise. While many enterprises think their strategies will be made clear by amassing and crunching increasingly large amounts of data, Silver suggests the opposite may be true: the more data one analyses, the easier it becomes to mistake correlation for causation. Otherwise put, it becomes harder to find signal in all the noise of Big Data.
Big Data isn’t a matter of asking bigger questions, in other words, but of asking better questions. As he notes, “Data is not a substitute for judgments you have to make.” Generally speaking, those “better questions” and “judgments” are going to come from people within the business who can query the data to find meaningful correlations.
With the right people in place, it becomes easier to see why Big Data means Big Money.
For example, Gartner forecasts Big Data to drive $34 billion in IT spending in 2013. Some companies, like Sears, clearly “get” Big Data and are putting it to work. The same is true of Mappy Health, which ingests large amounts of Twitter data into MongoDB to divine disease outbreaks.
But for the unwashed masses of enterprise IT, it sounds like Big Data is an aspiration, not a reality.
Still, it’s an aspiration that has hard dollars chasing it, as Dice.com jobs data shows. Of the top-10 job skills in demand on Indeed.com’s job boards, two of them (MongoDB and Hadoop) are Big Data-related.
Over time, however, I suspect this data scientist arms race to be absorbed by two other trends:
1. Big Data technology being embedded into applications.
2. Enterprises training existing employees on Big Data technologies rather than hiring data scientists.
On the first trend, Cloudera CEO Mike Olson perhaps said it best when he argued that the value of Big Data technology like Hadoop will increasingly be delivered through applications. Enterprises won’t need data scientists as their applications will process and analyse the data for them. Yes, someone will still need to know which questions to ask of the data, but the hard-core science of it should be rendered simpler by applications.
The second trend has already been covered above: enterprises that are smart about Big Data will invest in their employees, teaching them Big Data technologies. It’s easier for such employees to understand the company’s data than an outsider who understands data, generically, but has no context for a company’s particular data set.
All of which should provide some comfort to those organisations that have been struggling to find data scientists to analyse their data. It may turn out that the “mythical data scientist” already works for the company, and is ready to put the company’s data to use.