While Spark chooses good reasonable defaults for your data, if your Spark job runs out of memory or runs slowly, bad partitioning could be at fault. If your dataset is large, you can try repartitioning (using the repartition method) to a larger number to allow more parallelism on your job.