0% found this document useful (0 votes)
48 views

Case Study o Aims

The document discusses two case studies involving analyzing job data and user data using SQL queries. It provides the approaches taken to create databases and tables from CSV files, run queries to answer analytical questions, and report insights. Examples of queries calculate metrics like the number of jobs reviewed per day and percentage of languages, and showcase retrieving and analyzing data from multiple tables to determine user engagement and growth.

Uploaded by

Rahul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Case Study o Aims

The document discusses two case studies involving analyzing job data and user data using SQL queries. It provides the approaches taken to create databases and tables from CSV files, run queries to answer analytical questions, and report insights. Examples of queries calculate metrics like the number of jobs reviewed per day and percentage of languages, and showcase retrieving and analyzing data from multiple tables to determine user engagement and growth.

Uploaded by

Rahul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

DESCRIPTION : OPERATION ANALYTICS $INVESTIGATION METRICS SPIKE

Operation analytics involves gathering and analyzing data to gain valuable insights
into the performance and efficiency of business operations. Investigation metrics
spike refers to a sudden and significant increase in key metrics, serving as a
signal for further investigation to identify underlying causes, anomalies, or areas
requiring immediate attention and action.
So we as the Data Analyst Trainee have to load the csv files and create the
database, run the queries answer the questions asked in the respective case
studies:
• Case Study 1 > Job Data
• Case Study 2 >users> Events Table > Email Table
TECH-STACK USED

My-SQL Command-Prompt Google Drive


CASE STUDY 1
This Case Study contains the Job Data.
• having 7 attributes
• 8 fields
This case study has questions like:
• Calculate the number of jobs reviewed per hour per day for November 2020?
• Calculate 7 day rolling average of throughput? For throughput, do you prefer
daily metric or 7-day rolling and why?
• Calculate the percentage share of each language in the last 30 days?
• How will you display duplicates from the table?
We have to read the questions asked by the Data Managers and answer them by
running the SQL queries.
Report them to the respective manager.

/*Picture of the table Map has been attached on the next slide.*/
APPROACH : CASE STUDY1

• Downloaded the given dataset Case Study-1 .


• Study the dataset take out the necessary tables and columns for use in pen and
paper.
• Create a database and the respective tables
• Import the dataset via load data local infile (cause’ its efficient and faster then
data wizard import. This function is useful in importing large data like in
CaseStudy2.
• Run the queries to find the to the questions asked
INSIGHTS : CASE STUDY1
Creating the database and its tables

Data Base name: casestudy1


Table name : jobsdata
INSIGHTS : CASE STUDY1 - A
Number of jobs reviewed: Amount of
jobs reviewed over time.

Your task: Calculate the number of jobs


reviewed per hour per day for
November 2020? PHPD(perhourperday)

Date JobReviewedPHPD
30-11-2020 180
29-11-2020 180
28-11-2020 218
27-11-2020 35
26-11-2020 64
25-11-2020 80
INSIGHTS : CASE STUDY1- B(PART1)
Let’s say the above metric is called
throughput. Calculate 7 day rolling
average of throughput? For throughput,
do you prefer daily metric or 7-day
rolling and why? */

Date Daily Throughput

30-11-2020 0.02
29-11-2020 0.02
28-11-2020 0.01
27-11-2020 0.06
26-11-2020 0.05
25-11-2020 0.05
INSIGHTS : CASE STUDY1-B(PART2)
Calculate 7 day rolling average of
throughput? For throughput, do you
prefer daily metric or 7-day rolling and
why? */

Weekly Throughput is 0.03,

So, from here we conclude that we get


more detailed information on a daily
throughput which can be preferred
above the weekly throughput.
INSIGHTS : CASE STUDY1-C
Percentage share of each language:
Share of each language for different
contents.

Calculate the percentage share of each


language in the last 30 days?

Lang Totaljobs Lang%


English 1 12.5000
Arabic 1 12.5000
Persian 3 37.5000
Hindi 1 12.5000
French 1 12.5000
Italian 1 12.5000
INSIGHTS : CASE STUDY1-D(PART1)
Duplicate rows: Rows that have the
same value present in them.

Let’s say you see some duplicate rows


in the data. How will you display
duplicates from the table?

If we take duplicate rows on the basis of


JobID then there are 3 duplicates of the
ID 23
INSIGHTS : CASE STUDY1-D(PART2)
Let’s say you see some duplicate rows
in the data. How will you display
duplicates from the table?

If we take duplicate rows on the basis of


ActorID then there are 2 duplicates of
the ID 1003
CASE STUDY 2
This Case Study contains 3 Tables as follows.
• Users( 6 attributes and 19066 fields)
• Events(7 attributes and 340832 fields)
• Email(4 attributes and 90389 fields)
This case study has questions like:
• Calculate the weekly user engagement?
• Calculate the user growth for product?
• Calculate the weekly retention of users-sign up cohort?
• Calculate the weekly engagement per device?
• Calculate the email engagement metrics?
We have to read the questions asked by the Data Managers and answer them by
running the SQL queries.
Report them to the respective manager.

/*Picture of the table Map has been attached on the next slide.*/
APPROACH : CASE STUDY2

• Downloaded the given dataset Case Study-2.


• Study the dataset take out the necessary tables and columns for use in pen and
paper.
• Create a database and the respective tables
• Import the dataset via load data local infile (cause’ its efficient and faster then
data wizard import. This function is useful in importing large data like in this one.
MySQL workbench takes 5hours to import the data various the command line takes
5 sec to 10sec to import it.
• Run the queries to find the to the questions asked.
• This case study uses advanced SQL using functions like case and window
functions.
• At last, optimizing it howsoever possible.
INSIGHTS : CASE STUDY-2
Creating the database and its tables

Data Base name: casestudy1


Table names :
• Users

• Events

• Emails
INSIGHTS : CASE STUDY-2 A
Calculate the weekly user engagement?

WeekNum TotalUsers
17 8019
18 17341
19 17224
20 17911
21 17151
23 18280
22 18413
24 19052
25 18642
29 20067
26 19061
30 21533
28 20776
27 19881
31 18556
32 16612
33 16145
34 16127
35 784
INSIGHTS : CASE STUDY-2 B

User Growth: Amount of users growing over time for a product.


Calculate the user growth for product?

Snippet → Output
INSIGHTS : CASE STUDY-2 C

Weekly Retention: Users getting retained weekly after signing-up for a product.
Calculate the weekly retention of users-sign up cohort?

Snippet 3
Snippet 1

Snippet 2 Snippet 4
OUTPUT OF C :
INSIGHTS : CASE STUDY-2 D
Weekly Engagement: To measure the activeness of a user. Measuring if the user finds quality in
a product/service weekly. Calculate the weekly engagement per device?

Snippet → Output
INSIGHTS : CASE STUDY-2 E
Email Engagement: Users engaging with the email service. Calculate the email engagement
metrics?

Snippet → Output
RESULT: OUTCOME

• Learnt about command shell use by running queries by the command prompt
• Improved a little more on the MySQL(ADVANCED SQL), will work on few more
databases to have more clarity.
• Answering all the questions in the case study1 & 2 gave me a much better
conceptual clarity on the SQL Advanced.
• Learnt about presenting your inferences via PPT Reports.
• This point doesn’t hold much value but being from a non IT background learnt a lot
about the computer system.

You might also like