ds_imp_qs

Data science

Uploaded by

jeevansai496

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

ds_imp_qs

Data science

Uploaded by

jeevansai496

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Unit 1

1. Explain the different facets of data with example.

2. Explain in detail about the cleansing, integrating, transforming data and build a model.
3. Explore the various steps associated with data science process and explain any three steps of it with
suitable diagrams and example.
4.
a. Define data science and big data
b. List an overview of common errors in retrieving data and which cleansing solutions to be
employed.
c. Outline the difference between structured data and unstructured data?
5. What is a data warehouse? Outline the architecture of a data warehouse with a diagram.
6. Illustrate Basic Statistical descriptions of data.
7. Explain different ways of combining data in data science process.
8. What is data transformation? What are the techniques used in transforming the data?
9. What are the foremost goals of EDA? What are its types?
10. Explain the components of model building.
11. What is data miming? Explain the functions of data mining. Explain its architecture.
12. Describe Graphic Displays of Basic Statistical Descriptions of data.

Unit 2
1. Explain normal curve and z-score.
2. Using standard normal curve table, find the proportion of the total area identified with the following
statements.
a. above z score of 1.8
b. between the mean and a z score of 1.65
c. between z scores of 0 and -1.96
3. Describe the types of variable with an example for each.
4. Suppose a hospital tested the age and body fat data for randomly selected adults with the following
result:
Age 23 27 39 49 50 52 54 56 57 58 60
%fat 9.5 17.8 31.4 27.2 31.2 34.6 42.5 33.4 30.2 34.1 41
Draw the boxplots for age.
5. Find the mean, median, mode, variance, standard deviation and skewness for the given data:
Marks 0 – 10 10 -20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
No. of Students 10 40 20 0 10 40 16 14
6. The number of friends reported by facebook users is summarized in the following frequency
distribution.
Friends F
400 – 2
above
350 – 399 5
300 – 349 12
250 – 299 17
200 – 249 23
150 – 199 49
100 – 149 27
50 – 99 29
0 – 49 36
Total 20
0
a. What is the shape of this distribution?
b. Find the relative frequencies
c. Find the approximate percentile rank of the interval 300 – 349
d. Convert to a histogram
e. Why would it not be possible to convert to a stem and leaf display?
7. Perform an exploratory data analysis for the following data with different types of plots:
The dataset contains cases from a study that was conducted between 1958 and 1970 at the University
of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for cancer.
Data attributes:-
Age of patient at the time of operation (numerical)
Patient’s year of operation (year – 1900, numerical)
Number of positive axillary nodes detected (numerical)
Survival status (class attribute): 1 = the patient survived 5 years or longer, 2 = the patient died
within five years.
8.
a. Classify the below list of data into their types: i. ethnic group ii. age iii. family size iv.
academic major v. IQ score vi. net worth vii. third-place finish viii. gender ix. temperature x.
education; and, write a brief notes on the type.
b. Differentiate discrete and continuous variables.
c. Explain the types of data.
d. Define median with an example.
e. Compare and contrast qualitative data and quantitative data with an example.
f. List the differences between a discrete variable and a continuous variable with an example.
9. What is a frequency distribution? Customers who have purchased a particular product rated the
usability of the product on a 10-point scale, ranging from 1 (poor) to 10 (excellent) as follows:
3 7 2 7 8
3 1 4 1 3
0
2 5 3 5 8
9 7 6 3 7
8 9 7 3 6
Construct a frequency distribution for the above data.
10. What is relative frequency distribution? The GRE Scores for a group of graduate school applicants
are distributed as follows:
GRE Score Frequenc
y
725 – 749 1
700 – 724 3
675 – 699 14
650 – 774 30
625 – 649 34
600 – 624 42
575 – 599 30
550 – 574 27
525 – 549 13
500 – 524 4
475 – 499 2
Total 200
Explain the procedure to convert a frequency distribution into a relative frequency distribution and
convert the data presented in the above table to a relative frequency distribution.
11. What is Z-score? Outline the steps to obtain a Z-score.
12. Express each of the following scores as a Z Score: First, Mary’s intelligence quotient is 135, given a
mean of 100 and standard deviation 15. Second, Mary obtained a score of 470 in the Competitive
Examination conducted in April 2022, given a mean of 500 and a standard deviation of 100.
13. What is mode? Can there be distributions with no mode or more than one mode? The owner of a
new car conducts six gas mileage tests and obtains the following results, expressed in miles per gallow:
26.3, 28.7, 27.4, 26.6, 27.4 and 26.9. Find the mode for these data.
14. What is median? Outline the steps to find the median and find the median for the following scores:
set of five scores: 2, 8, 2, 7, 6; and second set of six scores are 3, 8, 9, 3, 1, 8.

Unit 3
1. Explain scatter plot. Categorize the different types of relationships using scatter plots.
2. Describe range and variance
3. Explain the correlation coefficient
4. Explain how the least squares equation which is used to minimize the total of all squared prediction
errors with example.
5. Each of the following pairs represents the number of licensed drivers (X) and the number of cars (Y)
for seven houses in my neighborhood:
Drivers Cars
(X) (Y)
5 4
4 3
2 2
2 2
3 2
1 1
2 2
a. Construct a scatterplot to verify a lack of pronounced curvilinearity.
b. Determine the least squares equation for these data. Calculate r, SSy an SSx.
c. Determine the standard error of estimate, Sy/x, given that n = 7
6. In studies dating back over 100 years, its well established that regression toward the mean occurs
between the heights of fathers and the heights of their adult sons. Indicate whether the statements are
true or false with reason.
a. Sons of tall fathers will tend to be shorter than their fathers.
b. Sons of short fathers will tend to be taller than the mean for all sons.
c. Every son of a tall father will be shorter than their fathers.
d. Taken as a group, adult sons are shorter than their fathers
e. Fathers of tall sons will tend to be taller than their sons but shorter than the mean for all
fathers.
f. Fathers of short sons will tend to be taller than their sons but shorter than the mean of all
fathers.
7. Interpret the value of r2 in correlation based analysis.
8. Assume that an r of -.80 describes the strong negative relationship between years of heavy smoking
(X) and life expectance(Y). Assume, furthermore, that the distributions of heavy smoking and life
expectancy each have the following means and sum of squares: 5 60 35 70 x y X Y SS SS
a. Determine the least squares regression equation for predicting life expectancy from years of
heavy smoking
b. Determine the standard error of estimate, Sy/x, assuming that the correlation of -.80 was
based on n = 50 pairs of observations.
c. Supply a rough interpretation of Sy/x
d. Predict the life expectancy for John, who has smoked heavily for 8 years.
e. Predict the life expectancy for Katie, who has never smoked heavily.
9. a. Consider Helen sent 10 greeting cards to her friends and she received back 8 cards, what is
the kind of relationship it is? Brief it in?
b. What is a percentile rank? Give an example?
c. Define multiple regressions.
d. Define regression towards the mean.
e. What is the use of scatter plot?
f. Define correlation coefficient.
10. Calculate the correlation coefficient for the heights ‘in inches’ of fathers’ (x) and their son’s(y) with
the data presented below:
x 66 68 68 70 71 72 72
y 68 70 69 72 72 72 74
11. The value of x and their corresponding values of y are given below:
x 0.5 1.5 2.5 3.5 4.5 5.5 6.5
y 2.5 3.5 5.5 4.5 6.5 8.5 10.5
a. Find the least square regression line y = ax + b.
b. Estimate the value of y when x = 10.
11. Consider the following dataset with one response variable y and two predictor variables x1 and x2.
y 140 155 159 179 192 200 212 215
x1 60 62 67 70 71 71 75 78
x2 22 25 24 20 15 14 14 11
Fit a multiple linear regression model to this dataset.

Unit 4
1. Explain grouping in python with example.
2. Explain data indexing and operation on missing data with suitable code and examples.
3. Describe in detail about pivot table.
4. Imagine you have a series of data that represents the amount of precipitation each day for a year in
a given city. Load the daily rainfall statistics for the city of Tirupati in 2021 which is given in a csv file
Tirupatirainfall2021.csv . Using pandas, generate a histogram for rainy days and find out the days that
have high rainfall.
5. Consider that an e-commerce organization like amazon, have different region sales as NorthSales,
SouthSales, WestSales, EastSales.csv files. They want to combime north and west region sales; south
and east sales; to find the aggregate sales of these collaborating regions. Help them to do so using
Python code.
6.
a. List the attributes of Numpy array. Give an example for each.
b. Create a data frame with key and data pairs as Key-data pair as A-10, B-20, A-40, C-5, B-10,
C-10. Find the sum of each key and display the result as each key group.
c. What are the key properties of Pearson Correlation Coefficient?
d. Summarize some built-in Pandas aggregations.
e. State the advantages of using Numpy arrays.
f. Outline two types of Numpy’s UFuncs.
7. What is an aggregate function? Elaborate about the aggregate functions in Numpy.
8. What is broadcasting? Explain the rules of broadcasting with an example.
9. Elaborate about the mapping between Python operators and Pandas methods.
10. Why is Numpy faster than lists? List and explain the categories of basic array manipulation methods
with example.

Unit 5
1. Explain the different types of joins in python
2. Explain various features of matplotlib platform used for data visualization and illustrate its
challenges.
3. How text and image annotations are done using Python? Give an example of your own with
appropriate Python code.
4. Apprise the following: a. Histograms b. Binning c. Density; with appropriate Python code.
5.
a. What is the purpose of errorbar function in matplotlib? Give an example.
b. Showcase 3 – dimensional drawing in matplotlib with corresponding Python code.
c. Explain Partial sort
d. Give a summary about the comparison operators.
e. State the two possible options in Python notebook used to embed graphics directly in the
notebook.
f. How plt.scatter function differs from plt.plot function?
6. Explain about various visualization charts like line plots, scatter plots and histograms using
matplotlib with an example.
7. Outline any two three-dimensional plotting in matplotlib with an example.

Chapters 1-4 Multiple Choice Practice
50% (2)
Chapters 1-4 Multiple Choice Practice
7 pages
Module 4 (Data Management) - Math 101
No ratings yet
Module 4 (Data Management) - Math 101
8 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Practice Questions
No ratings yet
Practice Questions
5 pages
Free Response Sem 1 Rev Key
No ratings yet
Free Response Sem 1 Rev Key
9 pages
question-bank
No ratings yet
question-bank
7 pages
Statistics Review
No ratings yet
Statistics Review
14 pages
Statistics Question Bank (2)
No ratings yet
Statistics Question Bank (2)
4 pages
Fdsa Question-Bank
No ratings yet
Fdsa Question-Bank
7 pages
Statistical Reasoning _question bank
No ratings yet
Statistical Reasoning _question bank
7 pages
Data Science Question Bank Updated - Google Docs
No ratings yet
Data Science Question Bank Updated - Google Docs
15 pages
Priority Questions
No ratings yet
Priority Questions
12 pages
Math1530finalreview Nospaces
No ratings yet
Math1530finalreview Nospaces
10 pages
Activity
No ratings yet
Activity
11 pages
APznzabuR0-e4CL4qe6ryGq6eSX07_aBv-JonnajxFWqaXYyuxKsnoM8i60n2MZS0Bwvfav2lMFJhtbZsh91bcm-kORyyt1rWSdOvJFcuPlFpelqEEJKCp3aijwQRcTVL3PK_QUNg8UA6Dflrqsj3rso2IWnWekdrigYaXaVjiy0pCrYa48sA_3Xcgiqz0EkJV5hz7fnjXjAKm5RLLuD1kh-meydtbCwCvb1TT1LtGLuChkl
No ratings yet
APznzabuR0-e4CL4qe6ryGq6eSX07_aBv-JonnajxFWqaXYyuxKsnoM8i60n2MZS0Bwvfav2lMFJhtbZsh91bcm-kORyyt1rWSdOvJFcuPlFpelqEEJKCp3aijwQRcTVL3PK_QUNg8UA6Dflrqsj3rso2IWnWekdrigYaXaVjiy0pCrYa48sA_3Xcgiqz0EkJV5hz7fnjXjAKm5RLLuD1kh-meydtbCwCvb1TT1LtGLuChkl
5 pages
AP Stats Final Review Ch1-15 Includes Answers
No ratings yet
AP Stats Final Review Ch1-15 Includes Answers
9 pages
QUIZ Complete Answers
0% (1)
QUIZ Complete Answers
21 pages
Mat 300 Exam 1 Practice Questions 1
No ratings yet
Mat 300 Exam 1 Practice Questions 1
6 pages
PGDRS 112-Exam Questions
No ratings yet
PGDRS 112-Exam Questions
3 pages
Probability_Statistics_MCQ_Practice_Set
No ratings yet
Probability_Statistics_MCQ_Practice_Set
3 pages
ad3491-foda-question-bank
No ratings yet
ad3491-foda-question-bank
7 pages
Question Bank
No ratings yet
Question Bank
7 pages
Question Bank
No ratings yet
Question Bank
7 pages
VCTest 1 BF09 Ans
No ratings yet
VCTest 1 BF09 Ans
9 pages
FDS Important Q
No ratings yet
FDS Important Q
5 pages
MCQ Statistics
No ratings yet
MCQ Statistics
8 pages
Final Review
100% (6)
Final Review
8 pages
CS3352-Foundations-of-Data-Science-Nov-Dec-2022-Question-Paper-Download (1)
No ratings yet
CS3352-Foundations-of-Data-Science-Nov-Dec-2022-Question-Paper-Download (1)
4 pages
MB650005 DATA ANALYSIS FOR MANAGEMENT
No ratings yet
MB650005 DATA ANALYSIS FOR MANAGEMENT
14 pages
QT all question
No ratings yet
QT all question
2 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
STATS 10 Assignment 1
No ratings yet
STATS 10 Assignment 1
7 pages
Fods Question Paper (1)(1)
No ratings yet
Fods Question Paper (1)(1)
4 pages
Rmmerged
No ratings yet
Rmmerged
37 pages
Foundations of Data Science Faq 5 Units
No ratings yet
Foundations of Data Science Faq 5 Units
13 pages
STAT 250 Practice Problem Solutions
100% (1)
STAT 250 Practice Problem Solutions
5 pages
Final review Packet
No ratings yet
Final review Packet
21 pages
important questions BMB 104
No ratings yet
important questions BMB 104
4 pages
2014 ACTM State Statistics PDF
No ratings yet
2014 ACTM State Statistics PDF
7 pages
Exam 1
No ratings yet
Exam 1
5 pages
Mock Exam - Summer 2024 (Business Stat 1)
No ratings yet
Mock Exam - Summer 2024 (Business Stat 1)
10 pages
AD3491 QB
No ratings yet
AD3491 QB
17 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Midterm Review Problems and Solutions
No ratings yet
Midterm Review Problems and Solutions
6 pages
reviewQ
No ratings yet
reviewQ
7 pages
Mock 2024 الحل
No ratings yet
Mock 2024 الحل
9 pages
492_1643636271_280
No ratings yet
492_1643636271_280
3 pages
BCA Semester I Basic Statistics Question Bank ( Use for Refe
No ratings yet
BCA Semester I Basic Statistics Question Bank ( Use for Refe
33 pages
QM - Endsem Important Questions
No ratings yet
QM - Endsem Important Questions
12 pages
Module 3 Numericals
No ratings yet
Module 3 Numericals
3 pages
FT-PT UG Academic Calendar Jan-Dec 2019
No ratings yet
FT-PT UG Academic Calendar Jan-Dec 2019
5 pages
Assignment II
No ratings yet
Assignment II
3 pages
Question Paper Code:: Reg. No.
No ratings yet
Question Paper Code:: Reg. No.
37 pages
MODULE 10_ DATA HANDLING
No ratings yet
MODULE 10_ DATA HANDLING
6 pages
Questions For 2nd Midterm Exam
No ratings yet
Questions For 2nd Midterm Exam
5 pages
Math B22 Practice Exam 1
No ratings yet
Math B22 Practice Exam 1
2 pages
Practice Statistics 2021
No ratings yet
Practice Statistics 2021
6 pages
AP Statistics Flashcards, Fifth Edition: Up-to-Date Practice
From Everand
AP Statistics Flashcards, Fifth Edition: Up-to-Date Practice
Barron's Educational Series
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
The Basics of 3D Platonic Order.: 3D Platonic Order, #1
From Everand
The Basics of 3D Platonic Order.: 3D Platonic Order, #1
Paul Maddock
No ratings yet
Wfa Boys 0 5 Percentiles
No ratings yet
Wfa Boys 0 5 Percentiles
3 pages
Choose The BEST Answer.: Practice Test 2 - Assessment of Learning Multiple Choice
100% (1)
Choose The BEST Answer.: Practice Test 2 - Assessment of Learning Multiple Choice
6 pages
J. K.Shah Classes Regression Analysis
No ratings yet
J. K.Shah Classes Regression Analysis
15 pages
Sta301 Lec07
No ratings yet
Sta301 Lec07
62 pages
7 - 8. Hanke, John E. - Wichern, Dean W. - Business Forecasting
No ratings yet
7 - 8. Hanke, John E. - Wichern, Dean W. - Business Forecasting
60 pages
nxalgo
No ratings yet
nxalgo
62 pages
Correlation Regression
No ratings yet
Correlation Regression
62 pages
Business Forecasting J. Holton (1) - 201-250
No ratings yet
Business Forecasting J. Holton (1) - 201-250
50 pages
Control Charts, Also Known As Shewhart Charts or Process-Behaviour Charts, in
No ratings yet
Control Charts, Also Known As Shewhart Charts or Process-Behaviour Charts, in
5 pages
mean median mode
No ratings yet
mean median mode
15 pages
ASG_MasrougaGhada_687365717368706769
No ratings yet
ASG_MasrougaGhada_687365717368706769
9 pages
Mathematics P3 Nov 2013 Memo Afr & Eng
No ratings yet
Mathematics P3 Nov 2013 Memo Afr & Eng
14 pages
Assignment (Bba 2d) (Business Stats)
No ratings yet
Assignment (Bba 2d) (Business Stats)
4 pages
Introduction To DFQR
No ratings yet
Introduction To DFQR
38 pages
GCE AS Level Representation of Data Advantages and Disadvantages of Different Representations of Data
No ratings yet
GCE AS Level Representation of Data Advantages and Disadvantages of Different Representations of Data
5 pages
Advanced Research Solution For Individual Assignment Done by Endeshaw Yibelta
100% (3)
Advanced Research Solution For Individual Assignment Done by Endeshaw Yibelta
11 pages
Uju Normalitas,: One-Sample Kolmogorov-Smirnov Test
No ratings yet
Uju Normalitas,: One-Sample Kolmogorov-Smirnov Test
6 pages
RD Sharma Solution Jan2021 Class 7 Chapter 23
No ratings yet
RD Sharma Solution Jan2021 Class 7 Chapter 23
26 pages
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
No ratings yet
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
17 pages
MOD_4_ex
No ratings yet
MOD_4_ex
21 pages
ASSIGNMENT 2 KANDY TEHERAN
No ratings yet
ASSIGNMENT 2 KANDY TEHERAN
9 pages
Confidence Intervals (2)
No ratings yet
Confidence Intervals (2)
24 pages
Day One Chem 309
No ratings yet
Day One Chem 309
8 pages
Analisis Soal Ulangan Harian Fitri Nur Hidayati
No ratings yet
Analisis Soal Ulangan Harian Fitri Nur Hidayati
12 pages
Interpretasi Hasil Spss
No ratings yet
Interpretasi Hasil Spss
2 pages
math 10-4th exam
No ratings yet
math 10-4th exam
6 pages
Group 8: Grade 7 Mathematics Fourth Quarter
No ratings yet
Group 8: Grade 7 Mathematics Fourth Quarter
17 pages
Statistics For Management Exam-Converted (1) - Min
100% (8)
Statistics For Management Exam-Converted (1) - Min
4 pages
Box Plot
No ratings yet
Box Plot
8 pages
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
No ratings yet
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
21 pages

ds_imp_qs

Uploaded by

ds_imp_qs

Uploaded by

Unit 1

1. Explain the different facets of data with example.

You might also like