The document outlines various topics related to data science, including differences between supervised and unsupervised learning, data structures in Python, and methods for handling missing values. It also covers the data science lifecycle, data preprocessing tasks, and practical exercises involving dataframes and series in Python. Additionally, it discusses data security issues and applications of data science across different fields.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views
1
The document outlines various topics related to data science, including differences between supervised and unsupervised learning, data structures in Python, and methods for handling missing values. It also covers the data science lifecycle, data preprocessing tasks, and practical exercises involving dataframes and series in Python. Additionally, it discusses data security issues and applications of data science across different fields.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
Q1.
Write short notes on the following (Any Five):
[5X3=15] a) Differentiate between supervised and unsupervised learning techniques. b) What is a Series and how is it different from a 1-D array, a list, and a dictionary? What are the various ways to create a dataframe in python? c) How can you fill missing values using fillna(), replace() methods? Explain with small code segment. d) Differentiate between Data Scientist and Data Engineers. e) What is Data Transformation in Data Science? f) Explain four major tasks in data pre-processing Q.2 (a) What is Data Science Lifecycle? Explain all stages with diagram. (5) (b) What are missing values? What are the strategies to handle them? Explain four methods of Imputation by giving example of each. (5) (b) What are the applications of Data science in various fields? (5) Q.2 (a) Create a dataframe to store data for 10 students (10)
Name Age Semester I Semester II Attendance
marks out marks out of of 600 500
Write program to perform following operations on above dataframe:
a. Display details of students who scored more than 560 marks in semester 1 b. Display details of students who scored less than 250 marks in semester II c. Display details of student who scored minimum marks in semester II d. Display details of student who scored maximum marks in semester II e. Display details of students whose attendance is more than 75. f. Display details of students whose attendance is less than 50. g. Insert 2 new records in dataframe h. Add a new column corresponding to percentage of marks of both semester. i. Add a new column corresponding to grades:- Both sem percantage Grade >=90 O >=75 to <90 A+ >=60 to <75 A >=50 to <60 B+ >=40 to <50 B < 40 F
(b) Write a program to convert a Pandas module Series to Python list.
(5)
Q3 (a) Create a dataframe of players with name, score-ODI, score-Test, score-
T20 for 5 players. (10) a. Add a new column corresponding to total score of each batsman. b. Display the player name along with runs scored in three types of matches using loc. c. Display the batsman details who scored runs more than : i. More than 2000 in ODI ii. Less than 2500 in Test iii. More than 1500 in T20 d. Display the alternate rows using iloc() function. e. Reindex the dataframe created above with batsman name and delete data of Hardik Pandya and Shikhar Dhawan by their index from original dataframe. f. Delete column named T20 and total using columns parameter in drop() function. g. Rename columns as T20 as Runs in T20, ODI as Runs in ODI, Test as Runs in Test. h. Count the total number of rows and columns of the dataframe. i. Add multiple records for each player. Also add columns- year, age and height. Then apply aggregate functions- sum, average, std deviation, min, max (groupby name). (b) Write a program to convert a given Series to an array (5) Q.4 (a) Give 4 ways of creating series by using List, arrays, dictionary, scalar value. (15) a) Write python code to create the following series 101 Harsh 102 Arun 103 Ankur 104 Harpal 105 Divya 106 Jeet b) Show details of 1st 3 employees using head function c) Show details of last 3 employees using tail function d) Show details of 1st 3 employees without using head function e) Show details of last 3 employees without using tail function f) Show value of index no 102. g) Show 2nd to 4th records. h) Show values of index no=101,103,105. i) Show details of “Arun” (b) Explain concept of Data Security? Explain various data security issues? (5) Q.5 (a) What are the different ways to add the columns in Pandas. Define a dictionary (7) containing Students data : data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],'Height': [5.1, 6.2, 5.1, 5.2], 'Qualification': ['Msc', 'MA', 'Msc', 'Msc']} a. to add a column address in pandas dataframe. b. to delete a column in Pandas DataFrame, c. to add and delete a new Row in Pandas DataFrame (b) Create the following DataFrame Sales containing year-wise sales figures for five salespersons in INR. Use the years as column labels, and salesperson names as row labels. (8)
b. Display the column labels of Sales. c. Display the dimensions, shape, size and values of Sales. d. Display the last two rows of Sales. e. Display the first two columns of Sales. f. Change the DataFrame Sales such that it becomes its transpose. g. Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438] in the years [2014, 2015, 2016, 2017] respectively. h. Delete the data for the year 2014 from the DataFrame Sales. i. Update the sale made by Shruti in 2017 to 100000. j. Write the values of DataFrame Sales to a comma-separated file SalesFigures.csv on the disk. Do not write the row labels and column labels. k. Change the name of the salesperson Ankit to Vivaan and Kinshuk to Shailesh. l. Delete the data for salesman Madhu from the DataFrame Sales.
Q6 (a) Explain four methods of creating Dataframe by using (5)
i. Multiple List of different length ii. Multiple Series Object iii. Nested Dictionary iv. Numpy Array (b) Explain five applications/use in different fields of Data Science. (5) (c) What is concat operation in data frame. Write the syntax and explain all parameters used in concat operation.