ai ass 5
ai ass 5
1st Semester
Subject: Introduction to AI, Data Science, Ethics and Foundation of Data Analytics
Unit-V Assignment 1
Date of Release: 19/12/2022
Last Date of Submission: 27/12/2022
Part - A 10 x 1 = 10
Q1 In this step, you gain the insights or make the predictions as stated in the project charter.
a. Data Transformation
b. Data Exploration
c. Data Preparation
d. Data Modeling
Q2 During the lifecycle of a machine learning or a data science project, the project team spends
________ of the time in data correction and cleansing.
a. very less
b. approx 20-30%
c. approx 50%
d. approx 80%
Q3 Which of the following is(are) the source(s) of external data
a. Data from companies such as Nielsen and GFK
b. Freely available Data from government organizations
c. Freely available Data from non-government organizations
d. All of the above
Q4 During which stage of project lifecycle, the data is investigated
a. data retrieval
b. data exploratory phases
c. data preparation
d. All of the above
Q5 The disadvantage of "Set value to null" method of fixing missing value is
a. You lose information from observation
b. May result into the false estimations from the model
c. Can artificially raise dependence among the variables
d. All the data modeling techniques cannot handle the null values
Q6 What is the objective of appending or stacking operation
a. enrich an observation from one table with information from another table
b. add the observations of one table to those of another table
c. create a new physical table or a new virtual table
d. clean the data and remove all the possible errors
Q7 Dummy variable work with ___________ variables
a. categorical
b. continous
c. quantitative
d. structured
Q8 Which graph can show the maximum, minimum, median, and other measures at the same time
a. Bar Chart
b. Pareto Chart
c. Box Plot
d. Histogram
Q9 It is important for the data scientist to be clear about the following during defining research goal step
1. how the project fits in the bigger organization objectives
2. how the project deliverables are going to be used
3. how the project outcome will change the business
4. how to select the best data model
a. 1, 2, 3, and 4
b. 1, 2 and 3
c. 1 and 3
d. 2 and 4
Q10 During the first step, The data scientist need to ask questions continuously until he clearly understands
a. Relationship between the variables
b. Expected Data Model
c. Business expectations and deliverables
d. Data Analysis Graphs
Part - B 5 x 2 = 10
Q11 Write any four benefits of using Structured approach.
Q12 Write any four information a Project Charter should document.
Q13 Write the three obstacles while acquiring the internal data.
Q14 Define Histogram with diagram.
Q15 Data modelling consists of three steps. What are they?
Part - C 5 x 6 = 30
Q16 A. Define boxplot with a featured diagram.
B. What is R-Squared and adjusted R-Squared.
Q17 What is Pareto Chart? Draw a Pareto chart with its table data.
Q18 Explain K-NN with diagram.
Q19 A. What are Outliers? How to deal with it.
B. Write any three techniques to deal with missing values in the data with their advantages &
disadvantages.
Q20 What is Data Transformation? Explain any two methods of transforming the data.
Part - D 10 x 2 = 20
Q21 Explain the Data integration and its subprocesses using Tables.
Q22 A. What is confusion matrix? Explain in brief.
B. What is Mean squared Error?
C. How to deal with impossible values and Redundant Whitespaces present in the data?