Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Brief overview of Python and Scala

Foundational Concepts (Theory):

  • Architecture
  • RDD
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Mastering the Basics via Databricks (Hands-on Workshop):

  • Exercises using the RDD API
  • Basic action and transformation functions
  • PairRDD
  • Join operations
  • Caching strategies
  • Exercises using the DataFrame API
  • SparkSQL
  • DataFrame operations: select, filter, group, sort
  • UDFs (User-Defined Functions)
  • Exploring the DataSet API
  • Streaming

Understanding Deployment via AWS (Hands-on Workshop):

  • Core concepts of AWS Glue
  • Differences between AWS EMR and AWS Glue
  • Example jobs run in both environments
  • Advantages and disadvantages of each

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories