Managed table and external table in spark
Table Types in Spark and Hive , Managed table and external table in spark
Apache Spark and Apache Hive support different types of tables for efficient data management. Understanding these table types is essential for handling large-scale datasets and optimizing storage.
Since the introduction of native DDL support in Spark 2.0, it is possible to change the location of table data. The type of a table can be checked using the SparkSession API:
spark.catalog.getTable("table_name")
A Managed Table means that Spark handles both the metadata and the data. When a managed table is dropped, both the table data and metadata are deleted.
/spark-warehouse/
).CREATE TABLE developer (id INT, name STRING);
OR using Delta format:
batched_orders.write.format("delta").partitionBy("submitted_yyyy_mm").mode("overwrite").saveAsTable("orders_table")
An External Table means that Spark manages only the metadata, while the actual data is stored at a user-defined location. When the table is dropped, only the metadata is removed, but the data remains intact.
CREATE EXTERNAL TABLE developer (id INT, name STRING) LOCATION '/tmp/tables/developer';
batched_orders.write.format("delta").partitionBy("submitted_yyyy_mm").mode("overwrite").saveAsTable("orders_table")
CREATE TABLE orders USING DELTA LOCATION '/path/to/data';
CREATE TABLE developer (id INT, name STRING) USING PARQUET;
CREATE TABLE developer (id INT, name STRING) USING PARQUET OPTIONS ('path'='/tmp/tables/table6');
In this article, we covered Managed Tables and External Tables in Spark, their key differences, and how to create them using Delta and Parquet formats. Spark also supports other formats like Avro, ORC, and JSON.
For more updates on Big Data and Apache Spark, follow us on Instagram and Facebook!
More Related Article on hive external or internal table ,please check below link :