Spark Developer In Real World

The goal of Spark Developer In Real World course is to take someone with zero experience in Spark and turn them into a confident Spark developer who can tackle complex production real world problems in today’s challenging Spark environments.

How do we achieve that?

This course is designed to address 2 key issues that we even see in Spark developers who work in Spark day to day and also someone who is new to Spark.

1. Lack of behind the scenes or deeper understanding in concepts

Why understanding what’s going on behind the scenes is important? Because that’s when you uncover a lot of things, that is when you get to you know the pitfalls in the tools and that is when we will understand how to perform optimizations and how to troubleshoot issues when things go bad. When we go deep, we will learn the tips and tricks that are available for us to take advantage of the full offering of the tool.

2. Most don’t have a 360 degree view of what Spark has to offer

Most aspiring and experienced developers in Spark don’t realize the full potential of Spark, they are focussed on a certain area of Spark and most don’t have a good 360 degree view of what Spark can offer. Spark is not just a tool for in-memory computation. Spark offers a lot more and has dedicated modules for topics like machine learning and streaming and a project named tungsten dedicated to optimize spark for years to come. Spark is aiming to be the go-to platform for data processing, learning and analytics. There is so much development and effort going in to Spark and it is one of the fastest evolving tool in the big data ecosystem.

In this course, we go deep in the concepts. We don’t just say Spark is faster because it does in-memory computing, we go deep in explaining what are the other reasons that enable Spark to be faster. We cover how projects in Spark like Spark SQL (Catalyst Optimizer) or Project Tungten enables Spark to achieve efficiency. We show you how to read logical and physical plans produced by Spark. We talk about resource management, memory structures etc.

This course doesn’t stop with the basic concepts like RDD and DataFrames, we go beyond that. We go in to areas of Machine Learning and Spark Streaming giving you a 360 degree view of what Spark has to offer there by enabling you to become a confident developer in Spark.

When you have both in-depth understanding and a 360 degree view of Spark you will be capable of handling complex production problems and managing real world Spark applications and clusters with confidence.



This course assumes that you have some basic understanding of Hadoop.. If you don’t know Hadoop, don’t worry.. We have a free course titled Hadoop Starter Kit and it will help you understand all the basics.. You can enroll in Hadoop Starter Kit course for for Free @


Almost all of the code and programs discussed in the course are written in Scala.. Because Scala is the default program for Spark.. You will encounter couple of Java programs as well. If you have a programming background. You should be able to follow through the programs even if you don’t know a bit of Scala with no issues. If you know Python, then it is almost like you know Scala already.

If you are new to Scala or functional programming don’t try to master the programming language before you start with the course. Start with the course and when a syntax look unfamiliar look up for that specific thing and move on with the course. If you want to get some Scala basics look in to functions especially anonymous & higher order functions.

Interesting Projects Covered In The Course

End to End Project: Build a mini Stackoverflow website (Spark, Elasticsearch, Kibana, REST and Angular)

Page Ranking pages from Wikipedia DataFrames

Analyzing Trending YouTube videos (CSV & JSON)

Predicting Country’s Happiness Rank from Happiness Score [Machine Learning]

Predicting 2016 US Elections [Machine Learning]

Predicting Yelp Rating (+ve / -ve) [Machine Learning]

Steaming with activity data from IoT device

Streaming data from with Kafka & Spark Streaming.

Living Course

Spark is evolving very quickly and we want this course to keep up with the developments in Spark as well. This course as with any of our other courses; will be a living course, meaning we will updating existing contents and add new contents to the course regularly.

As you go through the course, if you feel a concept that you think is important and it is missing in the course, please let us know by sending an email to [email protected] and based on the demand for that topic from other students we will prioritize the addition of topic to the course. If you are student of Hadoop Developer In Real World or Hadoop Administrator In Real World courses you know this already.. Both courses went through a lot of updates since going live.

Still got questions? Shoot us an email - [email protected]

Your Instructor

Hadoop In Real World
Hadoop In Real World

We are a group of Senior Hadoop Consultants who are passionate about Hadoop and Big Data technologies. We have experience across several key domains from finance and retail to social media and gaming. We have worked with Hadoop clusters ranging from 50 all the way to 1000s of nodes.

Course Curriculum

  Let's Get Started
Available in days
days after you enroll

Frequently Asked Questions

When does the course start and finish?
The course starts now and never ends! It is a completely self-paced online course - you decide when you start and when you finish.
How long do I have access to the course?
How does lifetime access sound? After enrolling, you have unlimited access to this course for as long as you like - across any and all devices you own.
What if I am unhappy with the course?
We would never want you to be unhappy! If you are unsatisfied with your purchase, contact us in the first 30 days and we will give you a full refund.

Get started now!