Spark is a framework and set of libraries for parallel data processing. It was created in 2014 to address many of Apache Hadoop's shortcomings, and is much faster than Hadoop for analytic workloads because it stores data in-memory (RAM) rather than on disk .