BDM1043 - Lab 5 - Covid Data Insights Using Hive Queries
BDM1043 - Lab 5 - Covid Data Insights Using Hive Queries
describe covidcases_dyn3;
select * from covidcases_dyn3 LIMIT 20;
select * from covidcases_dyn3 LIMIT 1;
Notes:
*** jar serde was needed so the commas in the PHU_NAME table can be
considered as char instead of extra separator in the csv file
*** the serde disadvantage is it converts all data types to string
*** the timestamp format is outputted using following functions
(from_unixtime(unix_timestamp(translate(FILE_DATE, '/', '-') , 'dd-MM-yyyy')))
*** then int and timestamp casting is done on the flye during select queries
Current Table:
Now we will answer the question “Using the Hive table created in Lab4 which will have latest
Ontario Covid data, come up with 5 questions you want answered from this data and execute
queries to answer them. “
SELECT FILE_DATE, PHU_NAME, PHU_NUM, DEATHS from covidcases_dyn3 SORT BY DEATHS DESC LIMIT 30;
3. What are the the top 5 places that had most covid deaths?