how to Read parquet file in spark and scala and create a dataframe
admin
#undefined
This is one way to load parquet file as dataframe :
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.parquet("src/main/resources/mydata.parquet")
df.printSchema
There is another way using spark sql :
val spark: SparkSession = SparkSession.builder.master("set_the_master").getOrCreate
spark.sql("select name, city, salary from parquet.`hdfs://path/myEmp`").show()
Third way that is :
val spark: SparkSession = SparkSession.builder.master("local").getOrCreate
var df=spark.read.option("mergeSchema",true).format("parquet").load("/tmp/mydir/*")