This class is designed for the Java Developer, our Hadoop class uses both Hortonworks as well as a best-selling book on Hadoop,
This course will present the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components. The student will use Hadoop and Hortonworks to support a variety of data analysis tasks, as well as to complete numerous big-data management activities.
This class will further examine related technologies such as Hive, Pig, and Apache Accumulo.
The student will also examine how Hadoop operates in the cloud computing environment. Students will explore real-world examples involve technologies such as Amazon Web Services, as well as various Hadoop-related related case studies.
Duration
4 Days.
Overview
The Student will learn how to use Apache Hadoop & the Hortonworks Interface to create & manage MapReduce programs. Students will begin with a quick overview of installing Hadoop, setting it up in a cluster, and then proceed to writing data consuming & analytic programs.
Training Outline
This training is 50% Lecture, and 50% Lab Exercises
Topics
What is Hadoop?
Starting Hadoop
Components of Hadoop
Writing basic MapReduce programs
Advanced MapReduce
Programming Practices
Cookbook
Managing Hadoop
Running Hadoop in the cloud
Programming with Pig
Hadoop Related Technologies
Case studies
Prerequisites
Programming experience; familiarity with Java and Linux
Schedule
4 days
Virtual or On-Site
Course Outline
What is Hadoop?
· Understanding distributed systems and Hadoop
· Comparing SQL databases and Hadoop
· Understanding MapReduce
· Counting words with Hadoop—running your first program
· History of Hadoop
Starting Hadoop
· The building blocks of Hadoop
· Setting up SSH for a Hadoop cluster
· Running Hadoop
· Web-based cluster UI
Components of Hadoop
· Working with files in HDFS
· Anatomy of a MapReduce program
· Reading and writing
Writing basic MapReduce programs
· Constructing the basic template of a MapReduce program
· Counting things
· Adapting for Hadoop’s API changes
· Streaming in Hadoop
· Improving performance with combiners
Advanced MapReduce
· Chaining MapReduce jobs
· Joining data from different sources
· Creating a Bloom filter
Programming Practices
· Developing MapReduce programs
· Monitoring and debugging on a production cluster
· Tuning for performance
Cookbook
· Passing job-specific parameters to your tasks
· Probing for task-specific information
· Partitioning into multiple output files
· Inputting from and outputting to a database
· Keeping all output in sorted order
Managing Hadoop
· Setting up parameter values for practical use
· Checking system’s health
· Setting permissions
· Managing quotas
· Enabling trash
· Removing DataNodes
· Adding DataNodes
· Managing NameNode and Secondary NameNode
· Recovering from a failed NameNode
· Designing network layout and rack awareness
· Scheduling jobs from multiple users
Running Hadoop in the cloud
· Introducing Amazon Web Services
· Setting up AWS
· Setting up Hadoop on EC2
· Running MapReduce programs on EC2
· Cleaning up and shutting down your EC2 instances
· Amazon Elastic MapReduce and other AWS services
Programming with Pig
· Installing Pig
· Running Pig
· Learning Pig Latin through Grunt
· Speaking Pig Latin
· Working with user-defined functions
· Working with scripts
· Seeing Pig in action—example of computing similar patents
Hadoop Related Technologies
· Hive
· Apache Accumulo
· Hadoop-related Tools & Tech
Amazon Web Services (AWS)
· Elastic MapReduce (EMR)
· Elastic Compute Cloud (EC2)
· Simple Storage Service (S3)
· Flexible Payments Service (FPS)
· Email Services (SES)
· SimpleDB
· and others …
Case Studies
· Amazon Web Services (AWS) & Yelp!
· Converting 11 million image documents from the New York Times archive
· Mining data at China Mobile
· Recommending the best websites at StumbleUpon
· Building analytics for enterprise search—IBM’s Project ES2
To Hire a proven AMS Hadoop for Java Developer Subject Matter Expert who also teaches this class, Call 800-798-3901 Today