0% found this document useful (0 votes)
4 views

Assignment 1

The document outlines an assignment on data handling using Python, submitted by students Samanpreet Singh and Hardeep Singh. It includes code for loading a dataset on phone usage in India, displaying data characteristics, identifying null values, and printing a specified range of the dataset. The dataset contains 53,058 rows and 19 columns, with various features related to phone usage and demographics.

Uploaded by

unknownloves2329
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Assignment 1

The document outlines an assignment on data handling using Python, submitted by students Samanpreet Singh and Hardeep Singh. It includes code for loading a dataset on phone usage in India, displaying data characteristics, identifying null values, and printing a specified range of the dataset. The dataset contains 53,058 rows and 19 columns, with various features related to phone usage and demographics.

Uploaded by

unknownloves2329
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment 1: Data Handling using Python

Roll number: 24071227 and 24071232


Student Name: Samanpreet Singh and Hardeep singh
Group: 7
Date of submission: 28-01-2025
Submitted to: Dr. Sukhjeet Kaur Ranade & Ms. Rama Rani
Program Title: Dataset of Phone Usage in india

Code
# Import the pandas library
import pandas as pd

# load the original data


df=pd.read_csv("E:\phone_usage_india_dirty.csv")

# Here we add dashline to understand more easyliy using following command


print("-" * 40)

# Display the number of rows, columns and Datatypes


datatypes=df.dtypes

print("Number of rows:",df.shape[0])
print("Number of columns:",df.shape[1])

print("-" * 40)
print("Data types:")
print(datatypes)

print("-" * 40)

# List of continuous features by only taking integer values


continue_features=df.select_dtypes(include=['int']).columns

# Print the continuous series


print("Continue_features:")
for feature in continue_features:
print(feature)

print("-" * 40)

# Display the dataset size (Number of rows)


print("Dataset size (Number of rows):",df.shape[0])

print("-" * 40)

# Find the number of null values in each column


null_values = df.isnull().sum()

# Print the results


print("Null values in the dataset")
print(null_values)

print("-" * 40)

# Identify discrete (categorical) features


categorical_features = df.select_dtypes(include=['object', 'category', 'float', 'int'])

# Count the number of unique categories for each discrete feature


category_counts = categorical_features.nunique()

# Print the results with proper alignment


print(f"{'Feature':<25}{'Unique Categories':<50}")

for feature, count in category_counts.items():


print(f"{feature:<30} {count:<60}")

print("-" * 40)

# Print the range according to user specifications


def print_csv_range(df, start, end):

try:
# Load the CSV file into a DataFrame
df=pd.read_csv("E:\phone_usage_india_dirty.csv")

# Print the specified range of rows


print(df.iloc[start:end])

# In case of finding an error


except FileNotFoundError:
print(f"Error: The file at '{df}' was not found.")
except Exception as e:
print(f"An error occurred: {e}")

# Get value from user


start=int(input("enter the starting range value:" ))

end=int(input("Enter the ending range value:" ))

print_csv_range(df, start, end)

print("-" * 40)
Output
df=pd.read_csv("E:\phone_usage_india_dirty.csv")
----------------------------------------
Number of rows: 53058
Number of columns: 19
----------------------------------------
Data types:
User ID object
Age float64
Gender object
Location object
Phone Brand object
OS object
Screen Time (hrs/day) float64
Data Usage (GB/month) float64
Calls Duration (mins/day) float64
Number of Apps Installed float64
Social Media Time (hrs/day) float64
E-commerce Spend (INR/month) float64
Streaming Time (hrs/day) float64
Gaming Time (hrs/day) float64
Monthly Recharge Cost (INR) float64
Primary Use object
Timestamp object
Customer_Satisfaction int64
Customer_Lifetime_Value float64
dtype: object
----------------------------------------
Continue_features:
Customer_Satisfaction
----------------------------------------
Dataset size (Number of rows): 53058
----------------------------------------
Null values in the dataset
User ID 5452
Age 5048
Gender 5335
Location 5247
Phone Brand 5303
OS 5353
Screen Time (hrs/day) 4822
Data Usage (GB/month) 4997
Calls Duration (mins/day) 5238
Number of Apps Installed 5456
Social Media Time (hrs/day) 5292
E-commerce Spend (INR/month) 5018
Streaming Time (hrs/day) 5334
Gaming Time (hrs/day) 5245
Monthly Recharge Cost (INR) 5270
Primary Use 5289
Timestamp 0
Customer_Satisfaction 0
Customer_Lifetime_Value 5018
dtype: int64
----------------------------------------
Feature Unique Categories
User ID 17665
Age. 47
Gende 3
Location 10
Phone Brand 10
OS 2
Screen Time (hrs/day) 112
Data Usage (GB/month) 492
Calls Duration (mins/day) 2945
Number of Apps Installed 191
Social Media Time (hrs/day) 56
E-commerce Spend (INR/month) 8195
Streaming Time (hrs/day) 76
Gaming Time (hrs/day) 51
Monthly Recharge Cost (INR) 1901
Primary Use 5
Timestamp 53058
Customer_Satisfaction 5
Customer_Lifetime_Value 48040
----------------------------------------
enter the starting range value:100
Enter the ending range value:2000
User ID Age Gender Location Phone Brand ... Monthly Recharge Cost (INR) Primary Use Timestamp Customer_Satisfaction
Customer_Lifetime_Value

100 U00101 31.0 Female Pune Apple ... NaN Education 2023-01-05 04:00:00 2 168606.410928

101 U00102 39.0 Male Pune Apple ... NaN Social Media 2023-01-05 05:00:00 4 64562.179611

102 U00103 15.0 NaN Mumbai Realme ... 687.0 Work 2023-01-05 06:00:00 0 136436.260992

103 U00104 21.0 Male NaN Apple ... 1129.0 Entertainment 2023-01-05 07:00:00 0 86387.988358

104 U00105 56.0 NaN NaN Motorola ... 1037.0 Entertainment 2023-01-05 08:00:00 1 91850.388545

... ... ... ... ... ... ... ... ... ... ... ...

1995 U01996 28.0 Other Bangalore Google Pixel ... 547.0 Entertainment 2023-03-25 03:00:00 2
108289.398497

1996 U01997 57.0 Male NaN Motorola ... 900.0 Work 2023-03-25 04:00:00 3 89147.220506

1997 U01998 23.0 NaN Ahmedabad OnePlus ... 509.0 Entertainment 2023-03-25 05:00:00 2
374678.862645

1998 U01999 25.0 Male Pune NaN ... 138.0 Gaming 2023-03-25 06:00:00 1 129811.377124

1999 U02000 35.0 Female Kolkata Vivo ... NaN Gaming 2023-03-25 07:00:00 1 142723.216312

[1900 rows x 19 columns]

You might also like