Practical _Questions_Unit 1 and 2

The document outlines a series of data analysis tasks, including Titanic Survival dataset analysis, dummy variable creation, untidy to tidy data transformation, Winsorization method, and missing value imputation. It also covers exploratory data analysis (EDA) and feature engineering on an insurance charges dataset, as well as linear regression model fitting and visualization. Each task specifies the steps to be performed, including data loading, visualization, statistical analysis, and model evaluation.

Uploaded by

zaheerkkd1312

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Practical _Questions_Unit 1 and 2

Uploaded by

zaheerkkd1312

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Questions

1. Titanic Survival Dataset Analysis:

o Load the Titanic Survival dataset.
o Display 5 sample observations from the dataset.
o Check and display the dataset information.
o Count the number of survivors based on gender (sex-wise
survival count).
o Check if there are any null values in the dataset.
o Plot a count plot showing the survival count passenger-wise.
o Create a strip plot of Age vs. Sex with hue as Survival status.
Identify the key factors influencing survival.
o Plot a pie chart for survival status and identify the percentage of
survivors.
2. Dummy Variable Creation:
o Create a DataFrame in Python using the following dataset:

Item Color
Item1 Red
Item2 Green
Item3 Blue
Item4 Red
Item5 Green

o Generate dummy variables for the Color column.

o Display the resulting DataFrame with the dummy variables.
3. Untidy to Tidy Data Transformation:
o Consider the following dataset:
Populatio
Country Year GDP
n
USA 2010 308 14992
USA 2011 311 15543
Canada 2010 34 1536
Canada 2011 35 1601

o Convert this untidy dataset into a tidy format such that

Population and GDP are represented as separate variables under
one column, and their respective values are listed in another
column.
o Display the tidy dataset.
4. Winsorization Method:
o For the given data [10, 15, 20, 25, 100, 150, 200], replace the
outliers with the 5th and 95th percentiles using the Winsorization
method.
5. Missing Value Imputation:
o For the given dataset:

Feature Feature
1 2
5 12
7 NaN
3 8
NaN 15
8 6
10 9
6 NaN
NaN 5
9 11
o Replace the missing values with the mean of their respective
columns.
o Replace the missing values with the median of their respective
columns.
o Replace the missing values using the K-Nearest Neighbors (KNN)
imputation method.

Question: Exploratory Data Analysis (EDA) and Feature Engineering

a) Load the dataset Insurance Charges Prediction.csv into a DataFrame

and perform the following:
o Display the first 5 rows of the dataset.
o Display the dataset information.
o Provide the statistical summary of the dataset for numerical
features.
o Provide the statistical summary of the dataset for categorical
features.
b) Perform Univariate Analysis:
o Plot histograms for all numerical columns in the dataset.
c) Perform Bivariate Analysis:
o Visualize the distribution of charges based on:
 Gender (sex) using a boxplot.
 Region (region) using a boxplot.
 Smoking status (smoker) using a boxplot.
o Plot a count plot to show the distribution of smoker status with
hue as sex.
o Create scatter plots for:
 Age vs. Charges.
 BMI vs. Charges.
d) Perform Correlation Analysis:
o Filter out the numerical columns.
o Calculate and display the correlation matrix for numerical
variables.
o Visualize the correlation matrix using a heatmap.
e) Filter the categorical variables:
o Extract only categorical features.
o Display the names of the categorical columns.
f) Perform Feature Engineering:
o Use pd.get_dummies function to encode the categorical variables
into dummy variables.

Question: Linear Regression Model and Visualization

a) Given the following dataset:

x y
5 5
15 20
25 14
35 32
45 22
55 38

b) Plot a scatter plot to visualize the relationship between x and y.

c) Fit a Linear Regression model using x and y.
o Calculate the coefficient of determination (R²) to evaluate the
goodness of fit.
o Display the intercept and coefficient of the linear regression
model.
d) Predict the dependent variable (y) for the following new values of the
independent variable (x): 8, 15, and 35.
e) Plot the original data points and overlay the fitted regression line.

Mathematics of Machine Learning
No ratings yet
Mathematics of Machine Learning
497 pages
Praxis 2 Scores
No ratings yet
Praxis 2 Scores
3 pages
Marko Radovanovic Complex Numbers in Geometry
No ratings yet
Marko Radovanovic Complex Numbers in Geometry
53 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Data Science
No ratings yet
Data Science
18 pages
manishadav
No ratings yet
manishadav
27 pages
GE Practical Sem 2 (2)
No ratings yet
GE Practical Sem 2 (2)
28 pages
External
No ratings yet
External
11 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
Certificate
No ratings yet
Certificate
25 pages
Ml Lab Manual 2024
No ratings yet
Ml Lab Manual 2024
41 pages
Lab_questionbank
No ratings yet
Lab_questionbank
3 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
Python practice questions (1)
No ratings yet
Python practice questions (1)
5 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Ip 065 PT 4
No ratings yet
Ip 065 PT 4
6 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
python 1
No ratings yet
python 1
16 pages
PRACTICAL QUESTIONS For DSBDA
No ratings yet
PRACTICAL QUESTIONS For DSBDA
9 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Guidelines_DAVP
No ratings yet
Guidelines_DAVP
3 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Chapter-3 (Data Visualization)
No ratings yet
Chapter-3 (Data Visualization)
63 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
DIVP PYQ 2023
No ratings yet
DIVP PYQ 2023
7 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
batch1 ds
No ratings yet
batch1 ds
15 pages
Data Science
No ratings yet
Data Science
3 pages
Practical Assignment4 1
No ratings yet
Practical Assignment4 1
6 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
lab record dev
No ratings yet
lab record dev
20 pages
fds qb
No ratings yet
fds qb
6 pages
StarterNotebook - Jupyter Notebook
No ratings yet
StarterNotebook - Jupyter Notebook
12 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
Cs3361 Set3 Fds Anna University
No ratings yet
Cs3361 Set3 Fds Anna University
3 pages
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
6208Download full General Mathematics Units 1 2 for Queensland 1st Edition Peter Jones ebook all chapters
100% (2)
6208Download full General Mathematics Units 1 2 for Queensland 1st Edition Peter Jones ebook all chapters
52 pages
Google Sheets
No ratings yet
Google Sheets
22 pages
The Law of Sines
No ratings yet
The Law of Sines
13 pages
Ideal and Dalton's Gas Law
No ratings yet
Ideal and Dalton's Gas Law
23 pages
POLYNOMIALS
No ratings yet
POLYNOMIALS
42 pages
Construction of Real Numbers
No ratings yet
Construction of Real Numbers
5 pages
02 Assign Electrostatics Gauss Law SC
No ratings yet
02 Assign Electrostatics Gauss Law SC
6 pages
Quanti Finals Zara 2
No ratings yet
Quanti Finals Zara 2
9 pages
Lecture 6 - Constrained Motion and Relative Velocity
No ratings yet
Lecture 6 - Constrained Motion and Relative Velocity
27 pages
Mathematical Foundations
No ratings yet
Mathematical Foundations
5 pages
LECTURE 5 & 6 - successive differentiation
No ratings yet
LECTURE 5 & 6 - successive differentiation
6 pages
Trapezium-pdf
No ratings yet
Trapezium-pdf
15 pages
A Network Model For Airline Cabin Crew Scheduling
No ratings yet
A Network Model For Airline Cabin Crew Scheduling
10 pages
WPE Quiz 1
No ratings yet
WPE Quiz 1
2 pages
AI All Exercises
No ratings yet
AI All Exercises
24 pages
Structural Analysis Third Edition Coates All Chapters Instant Download
100% (2)
Structural Analysis Third Edition Coates All Chapters Instant Download
65 pages
New Ideas in Low Dimensional Topology
100% (1)
New Ideas in Low Dimensional Topology
541 pages
Categorical and Nonparametric Data Analysis E. Michael Nussbaum - Download the ebook now to never miss important information
No ratings yet
Categorical and Nonparametric Data Analysis E. Michael Nussbaum - Download the ebook now to never miss important information
68 pages
3-Fanuc 5 Axis Programming Codes
No ratings yet
3-Fanuc 5 Axis Programming Codes
17 pages
LU-factorization and Positive Definite Matrices: Tom Lyche
No ratings yet
LU-factorization and Positive Definite Matrices: Tom Lyche
49 pages
Unit 7
No ratings yet
Unit 7
1 page
Rapoport, A. 1986 Paradoxical Effects of Social Behavior
No ratings yet
Rapoport, A. 1986 Paradoxical Effects of Social Behavior
186 pages
GRADE 12 2022 Investigation Memorandum.docx
No ratings yet
GRADE 12 2022 Investigation Memorandum.docx
6 pages
Year 10 Foundation Progress Practice June 222458
No ratings yet
Year 10 Foundation Progress Practice June 222458
16 pages
CHP 1
No ratings yet
CHP 1
43 pages
Download Complete Quantum Metrology, Imaging, and Communication 1st Edition David S. Simon PDF for All Chapters
100% (3)
Download Complete Quantum Metrology, Imaging, and Communication 1st Edition David S. Simon PDF for All Chapters
55 pages
Barnes 2017
No ratings yet
Barnes 2017
2 pages

Practical _Questions_Unit 1 and 2

Uploaded by

Practical _Questions_Unit 1 and 2

Uploaded by

Questions

1. Titanic Survival Dataset Analysis:

o Generate dummy variables for the Color column.

o Convert this untidy dataset into a tidy format such that

Question: Exploratory Data Analysis (EDA) and Feature Engineering

a) Load the dataset Insurance Charges Prediction.csv into a DataFrame

Question: Linear Regression Model and Visualization

a) Given the following dataset:

b) Plot a scatter plot to visualize the relationship between x and y.

You might also like