BDACh 05 L03 A Spark QLAnalytics
BDACh 05 L03 A Spark QLAnalytics
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 1
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Figure 5.4 Steps between acquisition of data from
different sources and its applications
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 2
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Steps For Data Analysis
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 3
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Steps For Data Analysis
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 4
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Steps For Data Analysis
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 5
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Steps For Data Analysis
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 7
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Steps For Data Analysis
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 8
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Steps For Data Analysis
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 9
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Spark SQL Connectivity to Inputs
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 10
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Figure 5.5 Connectivity between the applications
and Spark SQL
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 11
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Spark SQL/Hive Server (Thrift)
Connectivity to outputs
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 12
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
JDBC Server
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 13
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Hive Server (Thrift)
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 14
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
JSON, Hive, Parquet Objects
• HDFS is highly reliable for very long
running queries
• IO operations are slow
• Columnar storage used for faster IOs
• Columnar storage stores the data
portion, presently required for the IOs.
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 15
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
JSON, Hive, Parquet Objects
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 17
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
A nested hierarchical columnar
storage concept
• Apache Parquet three projects specify
the usages of files for query
processing or applications
• The projects are (i) parquet-format
and Thrift definitions of metadata, (ii)
parquet-mr and (iii) parquet-
compatibility for compatibly for read-
write in multiple languages
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 18
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Project parquet-mr
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 19
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Spark DataFrame (SchemaRDD)
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 21
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Summary
We learnt
• Steps between acquisition of data from
different sources and its applications
• Data into Spark SQL /HiveQL/
CassandraCQL for Querying Processing
either through Cassandra-Spark
Connector in Java or Data in Parquet,
JSON or Hive tables after ETL pipeline
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 22
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
Summary
• Connectivity between the applications
and Spark SQL
• JDBC Driver
• Parquet, JSON and DataFrames as
inputs to Spark SQL or Hive Server
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 23
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)
End of Lesson 3 on
Data Analytics using Apache®
Spark™ Components Spark SQL
and DataFrames
“Big Data Analytics “, Ch.05 L03: Spark and Big Data Analytics
2019 24
Raj Kamal, and Preeti Saxena © McGraw-Hill Education (India)