Understanding Big Data and the Hadoop Technologies

Brochure Image

by Daniel Eklund download a PDF brochure Download Event Brochure


Hadoop is the technology most associated with the Big Data movement of recent years. It is a
technology born out of Google and the Open Source community that has revolutionized organization’s perceptions of what can be done with their data from size to variety. As a result of its explosive growth, and handinhand with the plunging costs of commodity servers, and the growth of Cloud infrastructures, the Hadoop brand of technologies is now table stakes
to understand and implement within most medium to large organizations.
This seminar seeks to demystify Hadoop and its associated technologies: MapReduce, HDFS, Hive, Pig, HBase, and YARN, through a technical and honest evaluation.
Nurtured by the Open Source community, and implemented in Java, Hadoop still has a strong
Programmeroriented heritage, despite its promise within the data field. As such, a good understanding of Hadoop comes from three different vectors: data, programming, and
engineering. This seminar will examine all three: from the MapReduce computational model, to
the ShuffleSort paradigm first implemented, to how Java (and other languages) may be used to implement parallelized dataprocessing jobs, to why some data access patterns demand certain technologies.
Hadoop is still an earlystage technology, and understanding how to implement Hadoop will require a strong commitment, both organizationally and technically. This seminar hopes to make this commitment easier to realize.

What you will learn

  • Why Big Data has become important for all size businesses
  • What is Hadoop, MapReduce, and HBase
  • How you implement low-level MapReduce code in Java
  • How Hadoop is implemented: from JobTrackers to TaskTrackers, to YARN
  • How to write Hive, Pig, and UDFs
  • How NoSQL technologies interact with Hadoop

Main Topics

Hadoop and MapReduce

  • The precepts of Big Data
  • The Hadoop Computational Model
  • Java implementation of MapReduce
  • The Implementation Model

Hive, Pig

  • Hive
  • Pig
  • Relational Theory Digression
  • Other relational technologies

The other Big Data technologies

  • The Gigantic World of NoSQL
  • YARN and the Fast Query Technologies
  • A Standard Architecture