Becoming A Data Scientist: What A Data Scientist ISN’T

Community posts are submitted by members of the Big Data Community and span a range of themes. If you would like to contribute to the blog, just register to join the community.

If you’re reading this you probably already have an inkling of what a data scientist is. Have you ever considered what a data scientist isn’t? According to Vincent Granville, author of Developing Analytic Talent: Becoming a Data Scientist, data scientists are:

  • Not statisticians
  • Not data analysts
  • Not computer scientists
  • Not software engineers
  • Not business analysts

Data scientists do have some knowledge in each of these areas but also some outside of these areas.

NEITHER STATISTICIANS NOR DATA ANALYSTS: One reason the gap between statisticians and data scientists has grown over the last 15 years is that academic statisticians, who publish theoretical articles (sometimes not based on data analysis) and train statisticians, are… not statisticians anymore. Also, many statisticians think that data science is about analyzing data. But it is so much more than that! Over time, as statisticians catch up with big data and modern applications, the gap between data science and statistics will shrink.

NOT COMPUTER SCIENTISTS: First, data scientists are not computer scientists, because they don’t need the entire theoretical knowledge computer scientists have, and second, because they need to have a much better understanding of random processes, experimental design, and sampling – typically areas in which statisticians are expert. BUT data scientists DO need to be familiar with computational complexity, algorithm design, distributed architecture, and programming (R, SQL, NoSQL, Python, Java, and C++).

NOT SOFTWARE ENGINEERS: Data scientists do need to be domain experts in one or two applied domains.

NOT BUSINESS ANALYSTS: Data scientists don’t need to be MBAs, necessarily, but they do need to have success stories to share (with metrics used to quantify success), have strong business acumen, and be able to assess the ROI that data science can bring to their clients or their boss.

Data scientists do need to be good communicators to understand, and many times guess, what problems their client, boss, or executive management is trying to solve. Translating high-level English into simple, efficient, scalable, replicable, robust, flexible, platform-independent solutions is critical.

Learn more about what a data scientist is and isn’t by accessing a complimentary chapter from Developing Analytic Talent: Becoming a Data Scientist


Want to add your own blog to the Big Data Week community? Just register to get started.

10 Responses

  1. xebiaindia123

    The scientists of big data management services has explained that amount of applications used to have in 2nd and 3rd generation of enterprise search evolution were almost of same size. This was the period of late 90s and early 2000. In this generation people started talking about big data and multi nodes (cluster) setups. No solution was available till date which can scale to multiple nodes.

Leave a Reply