Today’s data scientists and data analysts work across a diverse array of domains but largely perform a set of common tasks that includes data cleansing and transformation, data mining, statistical modeling, and machine learning. Many of these tasks require tools and languages that are well suited to handling medium to large volumes of data. As a result, organizations are frequently turning to Hadoop-based solutions, which provide scalability, flexibility, and cost efficiency via a generalized open source platform of tools. It’s no surprise that Hadoop is expected to be at the core of more than half of all analytics software within the next two years.
However, many analysts struggle with the learning path to mastering Hadoop tools and integrating them to execute and operationalize a data analysis workflow. Jenny Kim introduces Hue, an open source, web-based UI that simplifies this process by unifying Hadoop’s core technologies and its most commonly used tools into a single user interface. You’ll learn how to install and configure Hue to work with a Hadoop installation, load data into HDFS using Hue’s file browser, and create Hive tables directly from HDFS files using Hue’s metastore create table wizard. Jenny also demonstrates how to use Hue’s Hive Editor to query data, save results to HDFS, and share queries for collaboration with other users and walks you through performing procedural data mining using the scripted language Pig Latin, monitoring running and completed Hadoop jobs using Hue’s job browser, and orchestrating a scheduled workflow using Oozie.
About Jenny Kim
Jenny Kim is an experienced big data engineer who works in both commercial software efforts and academia. Jenny is currently working with the Hue team at Cloudera to help build intuitive interfaces for analyzing big data with Hadoop. She has significant experience working with large-scale data, machine learning, and Hadoop implementations in production and research environments. In a previous project, Jenny, along with Benjamin Bengfort, built a large-scale recommender system that used a web crawler to gather ontological information about apparel products and produce recommendations from transactions.