A hands-on session on programming with Scala and Spark languages was conducted on 21st and 22nd December, 2018. In this era of smart devices, universal internet connectivity and IoT, managing big data efficiently becomes an absolute necessity. Big Data is defined as technologies and initiatives that involve data that is too diverse, fast-changing or massive for conventional technologies, skills and infrastructure to address efficiently.
The Big Data landscape is dominated by two classes of technology:
- Systems that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.
- Systems that provide analytical capabilities for a retrospective, complex analysis that may touch most or all of the data.
These classes of technology are complementary and frequently deployed together. To familiarize programmers with Big Data tools and techniques, a hands-on session on ‘Programming with Scala and Spark languages’ was conducted on 21st and 22nd December 2018 by Vijay Krishna Menon, Asst. Prof. (Sr. Gr.), Centre for Excellence in Computational Engineering & Networking (CEN).
Scala is a JVM based, statistically typed language that is safe and expressive and big data programmers prefer it because of its extensions that can be easily integrated into the language. Presently, Tech giants like LinkedIn, Twitter and Foursquare employ Scala and its proven performance record has generated interest amongst several financial institutions to use it for derivative pricing in EDF Trading.
Apache Spark is written in Scala and because of its scalability on JVM, Scala is most prominently used by big data developers for working on Spark projects. Developers state that Scala helps them to dig deep into Spark’s source code so that they could easily access and implement the newest features of Spark. Scala’s interoperability with Java is its greatest attraction as java developers could easily get on the learning path by grasping the object-oriented concepts quickly.
The session details are as follows:
Day 1 Topics Overview
- Functional Programming Primer.
- Immutability and causality.
- Scala Variables and Values.
- Functions in Scala and Tail recursion.
- Typing and Type inference.
- Functional Abstractions.
- Class in Scala and orthogonal Object-oriented code.
- Higher-order functions.
- MapReduce Example.
- Generalizing the MapReduce.
- Lists Arrays and other Data representations.
- Immutability and Functional-Style Algorithms (List sort)
Day 2 Topics Overview
- Higher-order List functions.
- Other Scala collections
- The ‘spark-shell’ and launching a stand-alone spark cluster
- RDDs and ways to create them.
- Basic MapReduce problems.
- Word count standard example.
- KV - Transformations on structured and unstructured datasets.
- Popular RDD based API.
- Other higher data abstractions, Data frames, Dataset, Row object and other case entities.
- Computing Pi using Spark MapReduce.
- Some real-life Big data case studies with hands-on (Count, Sort, and Time series).