Apache Spark, written in Scala, is a general-purpose distributed data processing engine. Or in other words: load big data, do computations on it in a distributed way, and then store it. Spark provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs .