Apache Hadoop Masterclass
by Tim Seears download a PDF brochure
Description
This one-day course is designed to help both IT professionals and decision-makers understand the concepts and benefits of Apache Hadoop and how it can help them meet business goals.
You will get a good understanding of the Hadoop technology stack, including MapReduce, HDFS, Hive, Pig, HBase, and provides an initial introduction to Mahout and other common utilities.
What you will learn
- The essential components of a Hadoop-based Data Management solution
- Pros and cons of implementing Hadoop
- How does Hadoop fit into our existing environment and architecture?
- The differences between various Hadoop distributions
Main Topics
- Why Hadoop?
o History & background
o Real-world use cases and case studies - The Hadoop Platform
o Introduction to MapReduce and Hadoop File System (HDFS)
o Data warehousing with Hive
o Parallel processing with Pig
o Data mining with Mahout
o Data storage with HBase
o Common utilities - Sqoop, Flume, Hue, Scribe, Zookeeper, HCatalog
o Hadoop distributions - Apache Foundation, Cloudera, Hortonworks, MapR, IBM - The future of Hadoop
o YARN - Next generation MapReduce
o Other programming paradigms on Hadoop