Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Data Management in HDFS
- Various Data Formats (JSON / Avro / Parquet)
- Compression Schemes
- Data Masking
- Labs : Analyzing different data formats; enabling compression
Section 2: Advanced Pig
- User-defined Functions
- Introduction to Pig Libraries (ElephantBird / Data-Fu)
- Loading Complex Structured Data using Pig
- Pig Tuning
- Labs : advanced pig scripting, parsing complex data types
Section 3 : Advanced Hive
- User-defined Functions
- Compressed Tables
- Hive Performance Tuning
- Labs : creating compressed tables, evaluating table formats and configuration
Section 4 : Advanced HBase
- Advanced Schema Modelling
- Compression
- Bulk Data Ingest
- Wide-table / Tall-table comparison
- HBase and Pig
- HBase and Hive
- HBase Performance Tuning
- Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
Requirements
- Proficiency in the Java programming language (as the majority of programming exercises utilize Java)
- Comfort with the Linux environment (including the ability to navigate the Linux command line and edit files using vi or nano)
- A practical understanding of Hadoop.
Lab environment
Zero Install: Participants do not need to install Hadoop software on their own machines! A functional Hadoop cluster will be provided for student use.
Students will require the following
21 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already