Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- History and core concepts of Hadoop
- Ecosystem overview
- Distributions
- High-level architecture
- Common myths about Hadoop
- Challenges associated with Hadoop
- Hardware and software requirements
- Lab: Initial exploration of Hadoop
Section 2: HDFS
- Design and architecture
- Core concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: Namenode, Secondary Namenode, DataNode
- Communication protocols and heartbeats
- Data integrity mechanisms
- Read and write paths
- Namenode High Availability (HA) and Federation
- Labs: Interacting with HDFS
Section 3: Map Reduce
- Concepts and architecture
- Daemons (MRV1): JobTracker and TaskTracker
- Execution phases: driver, mapper, shuffle/sort, reducer
- MapReduce Version 1 and Version 2 (YARN)
- Internal workings of MapReduce
- Introduction to Java-based MapReduce programming
- Labs: Executing a sample MapReduce program
Section 4: Pig
- Pig versus Java MapReduce
- Pig job flow
- Pig Latin language
- ETL processes with Pig
- Transformations and Joins
- User Defined Functions (UDF)
- Labs: Writing Pig scripts for data analysis
Section 5: Hive
- Architecture and design
- Data types
- SQL support within Hive
- Creating Hive tables and performing queries
- Partitions
- Joins
- Text processing
- Labs: Various exercises on data processing using Hive
Section 6: HBase
- Concepts and architecture
- HBase compared to RDBMS and Cassandra
- HBase Java API
- Time series data management on HBase
- Schema design
- Labs: Interacting with HBase via the shell; programming in the HBase Java API; Schema design exercise
Requirements
- Proficiency in the Java programming language (as most programming exercises are conducted in Java)
- Familiarity with the Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
Lab environment
Zero Installation Required : Students do not need to install Hadoop software on their own machines. A fully operational Hadoop cluster will be provided.
Participants will need the following tools:
- An SSH client (Linux and Mac systems come with built-in SSH clients; for Windows, PuTTY is recommended)
- A web browser to access the cluster (Firefox is recommended)
28 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already