SQL & pySPARK
SQL & pySPARK
DML OPERATIONS
df.count()
https://www.linkedin.com/in/mrabhijitsahoo/
Concept SQL PySpark
https://www.linkedin.com/in/mrabhijitsahoo/
Concept SQL PySpark
CURDATE,
from pyspark.sql.functions import current_date;
NOW, SELECT CURDATE() FROM table
df.select(current_date())
CURTIME
https://www.linkedin.com/in/mrabhijitsahoo/
Concept SQL PySpark
df.createOrReplaceTempView("cte1");
WITH cte1 AS (SELECT * FROM
df_cte1 = spark.sql("SELECT * FROM
table1),
cte1 WHERE condition");
CTE SELECT * FROM cte1 WHERE
condition df_cte1.show() or
df.filter(condition1).filter(condition2)
https://www.linkedin.com/in/mrabhijitsahoo/
DDL operations
https://www.linkedin.com/in/mrabhijitsahoo/
Concept SQL PySpark
CREATE TABLE table_name(
Create schema = StructType([
column_name data_type
StructField("id", IntegerType(), True),
Table with [constraints],
StructField("name", StringType(), False),
Columns column_name data_type
StructField("age", IntegerType(), True),
definition [constraints],
StructField("salary", DecimalType(10,2), True)])
...);
df = spark.createDataFrame([], schema)
https://www.linkedin.com/in/mrabhijitsahoo/
Concept SQL PySpark
Dropping a
ALTER TABLE table_name
column df = df.drop("column_name")
DROP COLUMN column_name;
https://www.linkedin.com/in/mrabhijitsahoo/
https://www.linkedin.com/in/mrabhijitsahoo
/