Explain what are accumulators in spark

5/14/2024
All Articles

#Explain what are accumulars in spark

Explain what are accumulators in spark

 

In Apache Spark, an accumulator is a shared variable that is used for aggregating(sum ,avg , max , min ) information across the tasks running in parallel on a cluster.
It allows for the efficient and fault-tolerant accumulation of results from worker nodes back to the driver program.

 

scala> sc.parallelize(Array(1, 2, 3)).foreach(x => accum.add(x))
-----
-----
scala> accum.value
res2: Long = 6

 

or

 

 val spark = SparkSession.builder()
    .appName(" accumulators in spark")
    .master("local")
    .getOrCreate()

  val longAcc = spark.sparkContext.longAccumulator("SumAccumulator")
 
  val rdd = spark.sparkContext.parallelize(Array(1, 2, 3,4))

  rdd.foreach(x => longAcc.add(x))
  println(longAcc.value)

Article