The goal of Spark Developer In Real World course is to take someone with zero experience in Spark and turn them into a confident Spark developer who can tackle complex production real world problems in today’s challenging Spark environments.
How do we achieve that?
This course is designed to address 2 key issues that we even see in Spark developers who work in Spark day to day and also someone who is new to Spark.
1. Lack of behind the scenes or deeper understanding in concepts
Why understanding what’s going on behind the scenes is important? Because that’s when you uncover a lot of things, that is when you get to you know the pitfalls in the tools and that is when we will understand how to perform optimizations and how to troubleshoot issues when things go bad. When we go deep, we will learn the tips and tricks that are available for us to take advantage of the full offering of the tool.
2. Most don’t have a 360 degree view of what Spark has to offer
Most aspiring and experienced developers in Spark don’t realize the full potential of Spark, they are focussed on a certain area of Spark and most don’t have a good 360 degree view of what Spark can offer. Spark is not just a tool for in-memory computation. Spark offers a lot more and has dedicated modules for topics like machine learning and streaming and a project named tungsten dedicated to optimize spark for years to come. Spark is aiming to be the go-to platform for data processing, learning and analytics. There is so much development and effort going in to Spark and it is one of the fastest evolving tool in the big data ecosystem.
In this course, we go deep in the concepts. We don’t just say Spark is faster because it does in-memory computing, we go deep in explaining what are the other reasons that enable Spark to be faster. We cover how projects in Spark like Spark SQL (Catalyst Optimizer) or Project Tungten enables Spark to achieve efficiency. We show you how to read logical and physical plans produced by Spark. We talk about resource management, memory structures etc.
This course doesn’t stop with the basic concepts like RDD and DataFrames, we go beyond that. We go in to areas of Machine Learning and Spark Streaming giving you a 360 degree view of what Spark has to offer there by enabling you to become a confident developer in Spark.
When you have both in-depth understanding and a 360 degree view of Spark you will be capable of handling complex production problems and managing real world Spark applications and clusters with confidence.
This course assumes that you have some basic understanding of Hadoop.. If you don’t know Hadoop, don’t worry.. We have a free course titled Hadoop Starter Kit and it will help you understand all the basics.. You can enroll in Hadoop Starter Kit course for for Free @ http://courses.hadoopinrealworld.com
Almost all of the code and programs discussed in the course are written in Scala.. Because Scala is the default program for Spark.. You will encounter couple of Java programs as well. If you have a programming background. You should be able to follow through the programs even if you don’t know a bit of Scala with no issues. If you know Python, then it is almost like you know Scala already.
If you are new to Scala or functional programming don’t try to master the programming language before you start with the course. Start with the course and when a syntax look unfamiliar look up for that specific thing and move on with the course. If you want to get some Scala basics look in to functions especially anonymous & higher order functions.
Interesting Projects Covered In The Course
Page Ranking pages from Wikipedia DataFrames
Analyzing Trending YouTube videos (CSV & JSON)
Predicting Country’s Happiness Rank from Happiness Score [Machine Learning]
Predicting 2016 US Elections [Machine Learning]
Predicting Yelp Rating (+ve / -ve) [Machine Learning]
Steaming with activity data from IoT device
Streaming data from Meetup.com with Kafka & Spark Streaming.
Spark is evolving very quickly and we want this course to keep up with the developments in Spark as well. This course as with any of our other courses; will be a living course, meaning we will updating existing contents and add new contents to the course regularly.
As you go through the course, if you feel a concept that you think is important and it is missing in the course, please let us know by sending an email to email@example.com and based on the demand for that topic from other students we will prioritize the addition of topic to the course. If you are student of Hadoop Developer In Real World or Hadoop Administrator In Real World courses you know this already.. Both courses went through a lot of updates since going live.
Still got questions? Shoot us an email - firstname.lastname@example.org
We are a group of Senior Hadoop Consultants who are passionate about Hadoop and Big Data technologies. We have experience across several key domains from finance and retail to social media and gaming. We have worked with Hadoop clusters ranging from 50 all the way to 1000s of nodes.