0% found this document useful (0 votes)
4 views

End semester Answer key format-fods

The document outlines the examination details for the course 'Foundations of Data Science' at Jai Shriram Engineering College, including an answer key for various questions related to data science concepts. It covers topics such as project charters, data warehousing, correlation coefficients, and data visualization techniques using Python libraries like NumPy and Matplotlib. Additionally, it includes tasks for analyzing sales data using pandas and creating different types of plots.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

End semester Answer key format-fods

The document outlines the examination details for the course 'Foundations of Data Science' at Jai Shriram Engineering College, including an answer key for various questions related to data science concepts. It covers topics such as project charters, data warehousing, correlation coefficients, and data visualization techniques using Python libraries like NumPy and Matplotlib. Additionally, it includes tasks for analyzing sales data using pandas and creating different types of plots.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Q.

P Code: 100114
JAI SHRIRAM ENGINEERING COLLEGE
An Autonomous Institution
B.E / B.Tech Degree Examinations Nov/ Dec – 2024

Course Code: CS3352 Course Name: Foundations of Data Science


Semester:3rd semester Max Marks: 100

Answer key
Part – A
10 x 2 = 20
1. Question: Identify the importance of project charter.
Answer:

A project charter authorizes a project, defines objectives, scope, and stakeholder roles,
ensuring alignment and clarity. It acts as a reference document throughout the project
lifecycle.

2. Question: Define Data Warehousing.


Answer:
Data warehousing involves collecting and storing data from multiple sources in a
centralized repository. It facilitates efficient querying, reporting, and decision-making.

3. Question: Given the following data set: 5,7,8,10,12,14,15,18,20.Calculate the


interquartile range.
Answer:

4. Question: Apply the formula to convert Z score to original score


Answer:
5. Question: Define z scores.
Answer:

6. Question: Identify the properties of correlation coefficient.


Answer:

7. Question: Name some NumPy Array attributes.


Answer:

8. Question: Write a comment to create two-dimensional array?


Answer:

# Create a 2D NumPy array using np.array()


import numpy as np
# Create a 2D array with 3 rows and 4 columns
array_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8],[9, 10, 11, 12]])
print(array_2d)
9. Question: How can you set different colors for bar plot?
Answer:
Use the color parameter in plt.bar().
plt.bar(x, y, color=['red', 'blue', 'green'])

10. Question: State the purpose of histogram


Answer:
 To visualize the distribution of numerical data.
 Shows the frequency of data within specific intervals (bins).
 Helps identify patterns, such as skewness or modality.

Scheme of Evaluation
Part – B
5 X 13 = 65
15. b) .

import matplotlib.pyplot as plt


import numpy as np
# Generate sample data for three groups
np.random.seed(42)
# Group 1
weights_group1 = np.random.uniform(56, 64, 20)
heights_group1 = np.random.uniform(120, 180, 20)
# Group 2
weights_group2 = np.random.uniform(60, 68, 20)
heights_group2 = np.random.uniform(140, 200, 20)
# Group 3
weights_group3 = np.random.uniform(66, 72, 20)
heights_group3 = np.random.uniform(160, 240, 20)
# Plotting the scatter plot
plt.figure(figsize=(8, 6))
# Group 1
plt.scatter(weights_group1, heights_group1, label='Group 1', color='blue', alpha=0.7)
# Group 2
plt.scatter(weights_group2, heights_group2, label='Group 2', color='green', alpha=0.7)
# Group 3
plt.scatter(weights_group3, heights_group3, label='Group 3', color='red', alpha=0.7)
# Adding labels, title, and legend
plt.title("Group wise Weight vs Height scatter plot")
plt.xlabel("weight")
plt.ylabel("height")
plt.legend()
plt.grid(True)
# Show plot
plt.show()

Part – C
1 X 15 = 15
16. a) You have been provided with a CSV file named "sales_data.csv" that contains
sales data for acompany. The file has the following columns: "Date", "Product",
"Quantity", and" Revenue". Your task is to load the data into a pandas Data Frame
and perform the following analysis.
Each 3 marks
i. Calculate the total revenue generated by the company.
ii. Find the product that generated the highest revenue.
iii. Calculate the average quantity sold per day.
iv. Group the data by month and calculate the total revenue for each month.
v. Plot a line graph showing the monthly revenue over time.
Answer

Python program
import pandas as pd
import matplotlib.pyplot as plt

# Load the CSV file into a DataFrame


df = pd.read_csv("sales_data.csv")

# Ensure 'Date' column is in datetime format


df['Date'] = pd.to_datetime(df['Date'])

# i. Calculate the total revenue generated by the company


total_revenue = df['Revenue'].sum()
print(f"Total Revenue: {total_revenue}")

# ii. Find the product that generated the highest revenue


highest_revenue_product = df.groupby('Product')['Revenue'].sum().idxmax()
print(f"Product with highest revenue: {highest_revenue_product}")

# iii. Calculate the average quantity sold per day


avg_quantity_per_day = df.groupby('Date')['Quantity'].sum().mean()
print(f"Average quantity sold per day: {avg_quantity_per_day}")

# iv. Group the data by month and calculate the total revenue for each month
df['Month'] = df['Date'].dt.to_period('M') # Group by month
monthly_revenue = df.groupby('Month')['Revenue'].sum()

# v. Plot a line graph showing the monthly revenue over time


plt.figure(figsize=(10, 6))
monthly_revenue.plot(kind='line', marker='o')
plt.title('Monthly Revenue Over Time')
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.grid(True)
plt.show()

OR
b) Develop an example for contour plot,histogram,3D plotting and line plot for
Matplotlib.
Answer

import matplotlib.pyplot as plt


from mpl_toolkits.mplot3d import Axes3D
import numpy as np

# Prepare a grid and data for Contour Plot and 3D Plot


x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))

# Random data for Histogram


data = np.random.randn(1000)

# Data for Line Plot


x_line = np.linspace(0, 10, 100)
y_line = np.sin(x_line)

# Create a figure with 4 subplots


fig = plt.figure(figsize=(14, 10))

# 1. Contour Plot
ax1 = fig.add_subplot(2, 2, 1)
contour = ax1.contour(X, Y, Z, levels=10, cmap='viridis')
fig.colorbar(contour, ax=ax1)
ax1.set_title('Contour Plot')
ax1.set_xlabel('X-axis')
ax1.set_ylabel('Y-axis')

# 2. Histogram
ax2 = fig.add_subplot(2, 2, 2)
ax2.hist(data, bins=30, color='blue', alpha=0.7, edgecolor='black')
ax2.set_title('Histogram')
ax2.set_xlabel('Data')
ax2.set_ylabel('Frequency')
# 3. 3D Plot
ax3 = fig.add_subplot(2, 2, 3, projection='3d')
ax3.plot_surface(X, Y, Z, cmap='viridis', edgecolor='none')
ax3.set_title('3D Surface Plot')
ax3.set_xlabel('X-axis')
ax3.set_ylabel('Y-axis')
ax3.set_zlabel('Z-axis')

# 4. Line Plot
ax4 = fig.add_subplot(2, 2, 4)
ax4.plot(x_line, y_line, label='sin(x)', color='red', linewidth=2)
ax4.set_title('Line Plot')
ax4.set_xlabel('X-axis')
ax4.set_ylabel('Y-axis')
ax4.legend()
ax4.grid(True)

# Adjust layout and show the plots


plt.tight_layout()
plt.show()
Course In-Charge HoD

You might also like