In this tutorial, we are going to work on Databricks community edition and convert a dataframe into a table using Apache spark.
We are going to use a dataset that is stored in Databricks Community. If you want more information about the datasets on databricks go to the following link. https://docs.databricks.com/data/databricks-datasets.html or if you want to use another CSV dataset you can follow this tutorial https://datamajor.net/how-to-read-csv-files-in-spark/ .
For completing the task, please follow the next steps.
Steps to convert a Dataframe into a Table in Spark
Step 1: Create a dataframe.
For this I will create an Scala notebook and I am going to use a dataset that contains flight information for US and Canada.
val csvFile = "/databricks-datasets/flights/departuredelays.csv" val tempDF = (spark.read .option("sep", ",") .option("header", "true") .option("inferSchema", "true") .csv(csvFile) )
After that we are going to check that the dataset contains records into it.
Step 2: Convert dataframe into a table.
For converting the dataframe into a table we are going to use the following command.
This will create a temporary view of the table on memory that is not persistent, but you can run a SQL query on it. if you want to create a persistent table use “saveAsTable” as follows.
For this example, we are going to create the temp table AirportCodes.
Step 3: Check Spark table by querying it.
By executing the following SQL query we are going to see the information that the table contains and also we are going to verify that dataframe information was converted to a Sql table.
%sql SELECT * FROM AirportCodes
By using %sql on the scala notebooks we are allowed to execute Sql queries on it.
As you can see it is returning the same information that we had on the Dataframe.
Step 4: Working on Spark Tables
After the dataset is converted into a Table, we can query those tables and store their results in other temporary or persistent tables.
%sql CREATE OR REPLACE TEMPORARY VIEW AirportCodesResume AS SELECT UPPER(City) as City ,State FROM AirportCodes;
If you want, you can check that the table contains the desire information.
%sql SELECT * FROM AirportCodesResume
This query is going to return the following result.
This will help us to make changes on the table and store those changes into another temp table.
If you want to create a persistent table, you can run the following query.
CREATE TABLE Tbl_AirportCodes AS SELECT UPPER(City) as City ,State FROM AirportCodesResume;
Step 5: Convert table into dataset.
If you want to convert your table back into a dataset, run the following command.val df_ResAirportCodes = spark.read.table("Tbl_AirportCodes")
This will store the table information into a dataframe.
If you want to check that the dataframe is populated with the information that we had on the table.display(df_ResAirportCodes)