how to Read parquet file in spark and scala and create a dataframe

admin

6/20/2023
All Articles

#undefined

how to  Read parquet file in spark and scala and create a dataframe

 

This is one way to load parquet file as dataframe : 

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val df = sqlContext.read.parquet("src/main/resources/mydata.parquet")

df.printSchema

 

There is another way using spark sql :

 

val spark: SparkSession = SparkSession.builder.master("set_the_master").getOrCreate

spark.sql("select name, city, salary  from parquet.`hdfs://path/myEmp`").show()

 

Third way that is :

val spark: SparkSession = SparkSession.builder.master("local").getOrCreate

 var df=spark.read.option("mergeSchema",true).format("parquet").load("/tmp/mydir/*")