EDA Toolkit for Data Scientists

Traditionally, data scientists have relied on tools such as SAS or SPSS to perform exploratory data analysis (EDA) to help identify attributes from a dataset that would be relevant in deriving insights from the data to help solve business problems. But the common challenge with these traditional tools is that they typically do not scale well with large datasets. For Big Data companies, working with large datasets in terabyte sizes are the norm. In this presentation, the audience will learn from a case study how the open source HPCC Systems big data platform was used for managing large datasets and how the EDA toolkit allows the ability to perform statistical analysis to identify attributes. This new technology also allows the data scientist to interact with this EDA capability without the need to write code, but rather through a drag and drop interface that integrates directly with the HPCC Systems backend. This innovative tool provides for significant productivity gain and time savings on data manipulation.

4:30 – 5:15pm – Welcome/Networking
5:15 – 6:30pm – Presentation
6:30 – 6:45pm – Q&A/Open Discussion
6:45 – 7:30pm – Mingle/Adjourn

Atlanta