Hadoop for Java Developers

 


This class is designed for the Java Developer, our Hadoop class uses both Hortonworks as well as a best-selling book on Hadoop,

This course will present the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components. The student will use Hadoop and Hortonworks to support a variety of data analysis tasks, as well as to complete numerous big-data management activities.

This class will further examine related technologies such as Hive, Pig, and Apache Accumulo.

The student will also examine how Hadoop operates in the cloud computing environment. Students will explore real-world examples involve technologies such as Amazon Web Services, as well as various Hadoop-related related case studies.

Duration

4 Days.

Overview

The Student will learn how to use Apache Hadoop & the Hortonworks Interface to create & manage MapReduce programs. Students will begin with a quick overview of installing Hadoop, setting it up in a cluster, and then proceed to writing data consuming & analytic programs.

Training Outline

This training is 50% Lecture, and 50% Lab Exercises

Topics

    What is Hadoop?

    Starting Hadoop

    Components of Hadoop

    Writing basic MapReduce programs

    Advanced MapReduce

    Programming Practices

    Cookbook

    Managing Hadoop

    Running Hadoop in the cloud

    Programming with Pig

    Hadoop Related Technologies

    Case studies

Prerequisites

Programming experience; familiarity with Java and Linux

Schedule

4 days

Virtual or On-Site

Course Outline

What is Hadoop?

·         Understanding distributed systems and Hadoop

·         Comparing SQL databases and Hadoop

·         Understanding MapReduce

·         Counting words with Hadoop—running your first program

·         History of Hadoop

Starting Hadoop

·         The building blocks of Hadoop

·         Setting up SSH for a Hadoop cluster

·         Running Hadoop

·         Web-based cluster UI

Components of Hadoop

·         Working with files in HDFS

·         Anatomy of a MapReduce program

·         Reading and writing

Writing basic MapReduce programs

·         Constructing the basic template of a MapReduce program

·         Counting things

·         Adapting for Hadoop’s API changes

·         Streaming in Hadoop

·         Improving performance with combiners

Advanced MapReduce

·         Chaining MapReduce jobs

·         Joining data from different sources

·         Creating a Bloom filter

Programming Practices

·         Developing MapReduce programs

·         Monitoring and debugging on a production cluster

·         Tuning for performance

Cookbook

·         Passing job-specific parameters to your tasks

·         Probing for task-specific information

·         Partitioning into multiple output files

·         Inputting from and outputting to a database

·         Keeping all output in sorted order

Managing Hadoop

·         Setting up parameter values for practical use

·         Checking system’s health

·         Setting permissions

·         Managing quotas

·         Enabling trash

·         Removing DataNodes

·         Adding DataNodes

·         Managing NameNode and Secondary NameNode

·         Recovering from a failed NameNode

·         Designing network layout and rack awareness

·         Scheduling jobs from multiple users

Running Hadoop in the cloud

·         Introducing Amazon Web Services

·         Setting up AWS

·         Setting up Hadoop on EC2

·         Running MapReduce programs on EC2

·         Cleaning up and shutting down your EC2 instances

·         Amazon Elastic MapReduce and other AWS services

Programming with Pig

·         Installing Pig

·         Running Pig

·         Learning Pig Latin through Grunt

·         Speaking Pig Latin

·         Working with user-defined functions

·         Working with scripts

·         Seeing Pig in action—example of computing similar patents

Hadoop Related Technologies

·         Hive

·         Apache Accumulo

·         Hadoop-related Tools & Tech

Amazon Web Services (AWS)

·         Elastic MapReduce (EMR)

·         Elastic Compute Cloud (EC2)

·         Simple Storage Service (S3)

·         Flexible Payments Service (FPS)

·         Email Services (SES)

·         SimpleDB

·         and others …

Case Studies

·         Amazon Web Services (AWS) & Yelp!

·         Converting 11 million image documents from the New York Times archive

·         Mining data at China Mobile

·         Recommending the best websites at StumbleUpon

·         Building analytics for enterprise search—IBM’s Project ES2

 

To Hire a proven AMS Hadoop for Java Developer Subject Matter Expert  who also teaches this class, Call 800-798-3901 Today

Leave a Reply