0% found this document useful (0 votes)
12 views

Batch 2

The document outlines three distinct problem statements focusing on customer behavior analysis, health and fitness tracking, and transportation system analysis, each accompanied by Python code snippets for dataset generation. For each problem, specific analyses are suggested, including calculating averages, identifying top performers, and generating various visualizations. The overall aim is to leverage data analysis techniques to derive insights from the generated datasets.

Uploaded by

Ankit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Batch 2

The document outlines three distinct problem statements focusing on customer behavior analysis, health and fitness tracking, and transportation system analysis, each accompanied by Python code snippets for dataset generation. For each problem, specific analyses are suggested, including calculating averages, identifying top performers, and generating various visualizations. The overall aim is to leverage data analysis techniques to derive insights from the generated datasets.

Uploaded by

Ankit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Question 1: Customer Behavior Analysis

Problem Statement:

1. Use the following Python code snippet to generate the dataset:


1 import pandas as pd
2 import numpy as np
3
4 np . random . seed (42)
5 customers = [ f " Customer_ { i } " for i in range (1 , 201) ]
6 ages = np . random . randint (18 , 70 , 200)
7 pu rc ha se _f re qu en cy = np . random . randint (1 , 20 , 200)
8 purchase_amount = np . random . uniform (50 , 1000 , 200)
9 data = {
10 " Customer_ID " : customers ,
11 " Age " : ages ,
12 " Pu rc ha se _F re qu en cy " : purchase_frequency ,
13 " A v e r a g e _ P u r c h a s e _ A m o u n t " : purchase_amount ,
14 }
15 customer_data = pd . DataFrame ( data )
16 customer_data . to_csv ( " customer_data . csv " , index = False )
17 print ( customer_data . head () )

Listing 1: Customer Data Generation

2. Perform the following analysis:

(a) Calculate the average purchase amount for different age


groups (e.g., 18-25, 26-40, etc.).
(b) Identify the top 10 customers based on total purchase amount
(Purchase Frequency * Average Purchase Amount).
(c) Create a histogram of Age.
(d) Generate a scatter plot of Age vs. Purchase Frequency.
(e) Create a heatmap to visualize correlations among Age, Purchase Frequency,
and Average Purchase Amount.

1
Question 2: Health and Fitness Tracking
Problem Statement:

1. Use the following Python code snippet to generate the dataset:


1 import pandas as pd
2 import numpy as np
3
4 np . random . seed (42)
5 user_ids = [ f " User_ { i } " for i in range (1 , 151) ]
6 steps = np . random . randint (1000 , 20000 , 150)
7 calories_burned = np . random . uniform (500 , 2500 , 150)
8 workout_duration = np . random . randint (10 , 120 , 150)
9 sleep_hours = np . random . uniform (4 , 10 , 150)
10 data = {
11 " User_ID " : user_ids ,
12 " Steps " : steps ,
13 " Calories_Burned " : calories_burned ,
14 " Workout_Duration " : workout_duration ,
15 " Sleep_Hours " : sleep_hours ,
16 }
17 health_data = pd . DataFrame ( data )
18 health_data . to_csv ( " health_data . csv " , index = False )
19 print ( health_data . head () )

Listing 2: Health Data Generation

2. Perform the following analysis:

(a) Calculate the average steps and calories burned by users


grouped by Workout Duration intervals (e.g., 10-30 min, 31-60
min, etc.).
(b) Identify users with more than 15,000 steps and their corre-
sponding Calories Burned and Workout Duration.
(c) Generate a boxplot to visualize Sleep Hours.
(d) Create a scatter plot of Workout Duration vs. Calories Burned.
(e) Generate a correlation heatmap for all numerical features.

2
Question 3: Transportation System Analysis
Problem Statement:

1. Use the following Python code snippet to generate the dataset:


1 import pandas as pd
2 import numpy as np
3
4 np . random . seed (42)
5 routes = [ f " Route_ { i } " for i in range (1 , 101) ]
6 distance = np . random . randint (5 , 500 , 100)
7 time_taken = np . random . uniform (0.5 , 10 , 100)
8 fuel_consumed = np . random . uniform (1 , 50 , 100)
9 vehicle_types = np . random . choice ([ " Car " , " Bus " , " Truck " ] ,
100)
10 data = {
11 " Route_ID " : routes ,
12 " Distance " : distance ,
13 " Time_Taken " : time_taken ,
14 " Fuel_Consumed " : fuel_consumed ,
15 " Vehicle_Type " : vehicle_types ,
16 }
17 transport_data = pd . DataFrame ( data )
18 transport_data . to_csv ( " transport_data . csv " , index = False )
19 print ( transport_data . head () )

Listing 3: Transportation Data Generation

2. Perform the following analysis:

(a) Calculate the average fuel efficiency (Distance/Fuel Consumed)


for each Vehicle Type.
(b) Identify routes with fuel efficiency below a threshold (e.g.,
5 km/L).
(c) Generate a bar plot to show Average Time Taken for each Vehicle Type.
(d) Create a scatter plot of Distance vs. Time Taken, colored by
Vehicle Type.
(e) Create a boxplot of Fuel Consumed for each Vehicle Type.

You might also like