0% found this document useful (0 votes)
3 views

1

The document outlines various topics related to data science, including differences between supervised and unsupervised learning, data structures in Python, and methods for handling missing values. It also covers the data science lifecycle, data preprocessing tasks, and practical exercises involving dataframes and series in Python. Additionally, it discusses data security issues and applications of data science across different fields.

Uploaded by

prem prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

1

The document outlines various topics related to data science, including differences between supervised and unsupervised learning, data structures in Python, and methods for handling missing values. It also covers the data science lifecycle, data preprocessing tasks, and practical exercises involving dataframes and series in Python. Additionally, it discusses data security issues and applications of data science across different fields.

Uploaded by

prem prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Q1.

Write short notes on the following (Any Five):


[5X3=15]
a) Differentiate between supervised and unsupervised learning techniques.
b) What is a Series and how is it different from a 1-D array, a list, and a dictionary? What
are the various ways to create a dataframe in python?
c) How can you fill missing values using fillna(), replace() methods? Explain
with small code segment.
d) Differentiate between Data Scientist and Data Engineers.
e) What is Data Transformation in Data Science?
f) Explain four major tasks in data pre-processing
Q.2 (a) What is Data Science Lifecycle? Explain all stages with diagram. (5)
(b) What are missing values? What are the strategies to handle them? Explain
four methods of Imputation by giving example of each. (5)
(b) What are the applications of Data science in various fields?
(5)
Q.2 (a) Create a dataframe to store data for 10 students
(10)

Name Age Semester I Semester II Attendance


marks out marks out of
of 600 500

Write program to perform following operations on above dataframe:


a. Display details of students who scored more than 560 marks in
semester 1
b. Display details of students who scored less than 250 marks in semester
II
c. Display details of student who scored minimum marks in semester II
d. Display details of student who scored maximum marks in semester II
e. Display details of students whose attendance is more than 75.
f. Display details of students whose attendance is less than 50.
g. Insert 2 new records in dataframe
h. Add a new column corresponding to percentage of marks of both
semester.
i. Add a new column corresponding to grades:-
Both sem percantage Grade
>=90 O
>=75 to <90 A+
>=60 to <75 A
>=50 to <60 B+
>=40 to <50 B
< 40 F

(b) Write a program to convert a Pandas module Series to Python list.


(5)

Q3 (a) Create a dataframe of players with name, score-ODI, score-Test, score-


T20 for 5 players.
(10)
a. Add a new column corresponding to total score of each batsman.
b. Display the player name along with runs scored in three types of
matches using loc.
c. Display the batsman details who scored runs more than :
i. More than 2000 in ODI
ii. Less than 2500 in Test
iii. More than 1500 in T20
d. Display the alternate rows using iloc() function.
e. Reindex the dataframe created above with batsman name and
delete data of Hardik Pandya and Shikhar Dhawan by their index
from original dataframe.
f. Delete column named T20 and total using columns parameter in
drop() function.
g. Rename columns as  T20 as Runs in T20, ODI as Runs in ODI, Test
as Runs in Test.
h. Count the total number of rows and columns of the dataframe.
i. Add multiple records for each player. Also add columns- year, age
and height. Then apply aggregate functions- sum, average, std
deviation, min, max (groupby name).
(b) Write a program to convert a given Series to an array
(5)
Q.4 (a) Give 4 ways of creating series by using List, arrays, dictionary, scalar value.
(15)
a) Write python code to create the following series
101 Harsh
102 Arun
103 Ankur
104 Harpal
105 Divya
106 Jeet
b) Show details of 1st 3 employees using head function
c) Show details of last 3 employees using tail function
d) Show details of 1st 3 employees without using head function
e) Show details of last 3 employees without using tail function
f) Show value of index no 102.
g) Show 2nd to 4th records.
h) Show values of index no=101,103,105.
i) Show details of “Arun”
(b) Explain concept of Data Security? Explain various data security issues?
(5)
Q.5 (a) What are the different ways to add the columns in Pandas. Define a
dictionary (7) containing Students data : data = {'Name': ['Jai', 'Princi',
'Gaurav', 'Anuj'],'Height': [5.1, 6.2, 5.1, 5.2], 'Qualification': ['Msc', 'MA',
'Msc', 'Msc']}
a. to add a column address in pandas dataframe.
b. to delete a column in Pandas DataFrame,
c. to add and delete a new Row in Pandas DataFrame
(b) Create the following DataFrame Sales containing year-wise sales figures for five
salespersons in INR. Use the years as column labels, and salesperson names as row labels.
(8)

2014 2015 2016 2017

Madhu 100.5 12000 2000 50000


Kusum 150.8 18000 5000 60000
Kinshuk 200.9 22000 70000 70000
Ankit 30000 30000 1000 80000
Shruti 40000 45000 1250 90000

a. Display the row labels of Sales.


b. Display the column labels of Sales.
c. Display the dimensions, shape, size and values of Sales.
d. Display the last two rows of Sales.
e. Display the first two columns of Sales.
f. Change the DataFrame Sales such that it becomes its transpose.
g. Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000,
78438] in the years [2014, 2015, 2016, 2017] respectively.
h. Delete the data for the year 2014 from the DataFrame Sales.
i. Update the sale made by Shruti in 2017 to 100000.
j. Write the values of DataFrame Sales to a comma-separated file SalesFigures.csv on the disk.
Do not write the row labels and column labels.
k. Change the name of the salesperson Ankit to Vivaan and Kinshuk to Shailesh.
l. Delete the data for salesman Madhu from the DataFrame Sales.

Q6 (a) Explain four methods of creating Dataframe by using (5)


i. Multiple List of different length
ii. Multiple Series Object
iii. Nested Dictionary
iv. Numpy Array
(b) Explain five applications/use in different fields of Data Science. (5)
(c) What is concat operation in data frame. Write the syntax and explain all
parameters used in concat operation.

******************************

You might also like