Quiz
Quiz
Question 1 (1 point)
The Derby DB warehouse directory is pointed to by “spark.sql.warehouse.dir” .It creates database if not exists
sales_db: If remove sales metadata from the metastore and from warehouse directory (Hive has two part
meta data and data .Metadata is in the metastore and data is in the file system so the actual data is in
“C:/Users/pedro/Documents/test_data/hive” )
Question 1 options:
True
False
Question 2 (1 point)
In Spark SQL high performance is achieved using the Catalyst switch.
Question 2 options:
True
False
Question 3 (1 point)
Data Frame is a Data set which columns do not have name.
Question 3 options:
True
False
Question 4 (1 point)
Data sources have their own options that can be specified during the load process:
val salesRecords = spark.read.format("csv”) .option("sep", ";").option("inferSchema", "true")
.option("header", "false”) .load("/Users/hadoop-user/Documents/SalesJan2009.csv").
While reading a csv file, the first row is always taken as the header, if the header option is set to false, as
shown in the above example
Question 4 options:
True
False
Question 5 (1 point)
In term of Reading Parquet Files ,if the base location of the table is specified as the path, the partitions are
not automatically discovered
Question 5 options:
True
False
Question 6 (1 point)
Data Frame run on specific engine of spark environment. That engine called catalyst engine.
Question 6 options:
True
False
Question 7 (1 point)
Hive support needs to be enabled on spark session.The hive warehouse directory needs to be set as
“spark.sql.warehouse.dir”. Once the session is created, SQL statements can be issued using
sparkSession.sql(“<sql_statement>”). Bucketing , sorting and partitioning can be done on the tables being
saved.
Question 7 options:
True
False
Question 8 (1 point)
In Spark SQL ,The entire things happening in memory .We do not have any db.Every thing happening in
memory .Spark is In memory db .Any thing you do in any db you can do it here too.
Question 8 options:
True
False
Question 9 (1 point)
When you create tempview you create for only this session however when you create globalview its available
for all session.
Question 9 options:
True
False
Question 10 (1 point)
Global view available to all sessions but not when we close the current session
Question 10 options:
True
False
Question 11 (1 point)
In term of Reading Parquet Files ,Partition columns of numeric data types, date, timestamp and string types
are automatically inferred
Question 11 options:
True
False
Question 12 (1 point)
In RRD we deal with structure and semi structure data however in Data frame we deal with structure data.
Question 12 options:
True
False
Question 13 (1 point)
createOrReplaceTempView will overwrite the existing view if it exist
Question 13 options:
True
False
Question 14 (1 point)
An easier way to load various types of data, other than parquet, is using the following:
val salesRecords = spark.read. .format(“csv”) .load("/Users/hadoop-user/Documents/SalesJan2009.csv”)
.The formats are data sources and should be referred to using their fully qualified names like
“org.apache.spark.sql.parquet”
Question 14 options:
True
False
Question 15 (2 points)
The write object is derived from session object (spark session).The reader object is derived from data frame
Question 15 options:
True
False
Question 16 (1 point)
val salesRecords = spark.read.load("/Users/hadoop-user/Documents/SalesJan2009.parquet")
This loads a parquet file by default.The default option is specified by the configuration property
“spark.sql.sources”
Question 16 options:
True
False
Question 17 (1 point)
We always apply sql in the data frame
Question 17 options:
True
False
Question 18 (2 points)
In term of Reading Parquet Files ,the columns are automatically inferred because the property
“spark.sql.sources.partitionColumnTypeInference.enabled” is set to false by default .If the above property is
set to true, all partition columns will be read as String
Question 18 options:
True
False
Question 19 (1 point)
We use filter function when is complicated condition because you can not use typical sql stuff that’s why we
write the filter function .The function is pass the particular row and then you do row.(methods) to get the
value of particular row
Question 19 options:
True
False
Question 20 (1 point)
In Spark SQL unit of processing is the Data set or Data Frame
Question 20 options:
True
False
Question 21 (1 point)
Data set is a collection of records.Each record in the Hive table is a row. Data set is a resilience distributed
collections of rows where row is an object. row is Scala class
Question 21 options:
True
False
...