Understanding Big Data and the Hadoop Technologies

by Daniel Eklund download a PDF brochure

Description

Hadoop is the technology most associated with the Big Data movement of recent years. It is a
technology born out of Google and the Open Source community that has revolutionized organization’s perceptions of what can be done with their data from size to variety. As a result of its explosive growth, and handinhand with the plunging costs of commodity servers, and the growth of Cloud infrastructures, the Hadoop brand of technologies is now table stakes
to understand and implement within most medium to large organizations.
This seminar seeks to demystify Hadoop and its associated technologies: MapReduce, HDFS, Hive, Pig, HBase, and YARN, through a technical and honest evaluation.
Nurtured by the Open Source community, and implemented in Java, Hadoop still has a strong
Programmeroriented heritage, despite its promise within the data field. As such, a good understanding of Hadoop comes from three different vectors: data, programming, and
engineering. This seminar will examine all three: from the MapReduce computational model, to
the ShuffleSort paradigm first implemented, to how Java (and other languages) may be used to implement parallelized dataprocessing jobs, to why some data access patterns demand certain technologies.
Hadoop is still an earlystage technology, and understanding how to implement Hadoop will require a strong commitment, both organizationally and technically. This seminar hopes to make this commitment easier to realize.

What you will learn

Why Big Data has become important for all size businesses
What is Hadoop, MapReduce, and HBase
How you implement low-level MapReduce code in Java
How Hadoop is implemented: from JobTrackers to TaskTrackers, to YARN
How to write Hive, Pig, and UDFs
How NoSQL technologies interact with Hadoop

Main Topics

Hadoop and MapReduce

The precepts of Big Data
The Hadoop Computational Model
Java implementation of MapReduce
The Implementation Model

Hive, Pig

Hive
Pig
Relational Theory Digression
Other relational technologies

The other Big Data technologies

The Gigantic World of NoSQL
YARN and the Fast Query Technologies
A Standard Architecture