MODEL EXAM II Answer Key - For Merge

Uploaded by

devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views20 pages

MODEL EXAM II Answer Key - For Merge

Uploaded by

devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

MODEL EXAM II – SET 1

Sub. Name: Foundations of Data Science Branch / Year / SEM: IT / II/ III
Sub. Code :CS3352 Date :
Duration :3hours. Marks : 100

ANSWER ALL QUESTIONS

Part A (10x2=20 Marks)

1. What are the properties of correlation coefficient of R²?
The two properties are:
 The sign of R indicates the type of linear relationship, whether positive or negative
 The numerical value of R, without regard to sign indicates the strength of the linear relationship.

2. What is regression?
 Regression is the statistical method to determine the relationship between dependent variable and
a series of other variables known as independent variable.
 A regression line is a line that is used to describe the behavior of a set of data. It is used to
forecast procedures.

3. Give the least square regression equation.

 Y= bX+a
 Y represents the predicted value.
 X represents the predicted value.
 a and b represent numbers calculated from the original correlation analysis.

4. What is interpretation of R²?

 The squared correlation coefficient R² provides us with not only a key interpretation of the
correlation coefficient but also a measure of predictive accuracy that supplements the standard
error of estimate, Sy|x
 The coefficient of determination, often denoted as R-squared (R²), is a statistical measure that
represents the proportion of the variance in the dependent variable that is explained by the
independent variables in a regression model.

5. Difference between correlation and regression.

Correlation Regression
1. Relationship between variables 1. One affects the other variable
2. Variables move together 2. Cause and effect
3. X and Y can be interchanged 3. X and Y cannot be interchanged
4. Data represented by a single point 4. Data represented by a line.

6. What is numpy and Panda?

 Numpy is a general-purpose array- processing package with high-performance multidimenstional
array object, and tools. It is the fundamental package for scientific computing with Python. It
provides N-dimentinal array object supporting many sophisticated ( broadcasting) functions.
 Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the
Numpy package and its key data structure is called the Data Frame. Data Frames allow you to
store and manipulate tabular data in rows of observations and columns of variables. Pandas is
built on top of the Numpy package, meaning a lot of the structure of Numpy is used to replicated
in pandas.

7. What are the attributes of numpy?

 Ndim: number of dimensions
 Shape: size of each dimension
 Dtype: total size of the array
 Itemsize:data type of the array
 Nbytes: lists the total size of the array

8. What is Python list?

 Elements can belong to different data types.
 No need ti explicitly import a module for declaration. Cannot directly handle arithmetic oeprations.
 Preferred for shorter sequence of data items.
 The entire list can be printed without any explicit looping.
 Consume larger memory for easy addition of elements.

9. What are universal functions?

 A universal function ( or ufunc for short) is a function that operates on ndarrays in a element-by-
element fashion.
 It is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and proudces
a fixed number of specific outputs.
 These functions include standard trigonometric functions, functions for arithmetic operations,
handling complex numbers, statistical functions.

10. What is fancy indexing?

 With numpy array fancy indexing, am array can be indexed with another numpy array, a python list,
or a sequence of intergers, whose values select elements in the indexed array.
 Fancy indexing is like the simple indexing in which arrays are passed as indices in place of single
scalars.
 This allows us to very quickly access and modify complicated subsets of an array’s values.
 When using fancy indexing, the shape of the result replicates the shape of the index arrays not the
shape of the array being indexed.
PART-B
11. A) i) calculate the Karl Pearson’s coefficient of correlation for the following data.
X 28 45 40 38 35 33 40 32 36 33
Y 23 34 33 34 30 26 28 31 36 35

ii) A teacher is interested in studying the relationship between the performance in Statistics and
Economics of a class of 20 students. For this the compilers the scores on these subjects in the last
semester examination. Some data of this type are presented in table. Calculate correlation coefficient for
the data.
12. B) i. Find Karl Pearson correlation coefficient for the following paired data.
Wages 10 10 10 10 10 9 9 9 96 95
0 1 2 2 0 9 7 8
Cost Of living 98 99 99 97 95 9 9 9 90 91
2 5 4

Solution:
X= wages y= cost of living
x̄= 100+101+102+102+100+99+97+98+96+95/10 = 990/10 = 99
Ȳ= 98+99+99+97+95+92+95+94+90+91/10 = 950/10 = 95
ii)
13. A) i)

ii) ) A sample of 12 fathers and their elder sons gave the following data about their heights in inches,
Calculate the coefficient of rank correlation.
Father 65 6 6 6 68 6 7 6 6 6 6 7
3 7 4 2 0 6 8 7 9 1
Sons 68 6 6 6 69 6 6 6 7 6 6 7
6 8 5 6 8 5 1 7 8 0
12. b) i)
12. b)ii)
13. a) i)
15. a) Explain about data wrangling?
 Data wrangling is the process of transforming data from its original “raw” form into a more
digestible format and organizing sets from various sources into a singular coherent whole for
further processing.
 Data wrangling covers the following process:
 Getting data from the various source into one place.
 Piecing the data together according to the determined setting
 Cleaning the data from the noise or erroneous, missing elements.

Data wrangling has 6 iterative steps:

1. Discovering: We must understand what is in our data, which will inform how we want to
analyse it. How we wrangle customer data, for example, may be informed by where they
are located, what they bought, or what promotions they received.
2. Structuring: this means organising the data, which is necessary because raw data comes
in many different shapes and sizes. A single column may turn into several rows for easier
analysis. One column may become two. Movement of data is made for easier
computation and analysis.
3. Cleaning: what happens when errors and outliers skew our data? Clean the data. What
happens when state data is entered as AP or Andhra Pradesh or Arunachal Pradesh? Null
values are changed and standard formatting implemented, estimate increasing data
quality.
4. Enriching: here we take stock in our data and strategize about how other additional data
might augment it. Questions asked during this data wrangling step might be: what new
types of data can I derive from what I already have or what other information would
better inform my decision making about this current data?
5. Validating: validation rules are repetitive programming sequences that verify data
consistency, quality, and security. Examples of validation include ensuring uniform
distribution of attributes that should be distributed normally (eg. Birth dates) or
confirming accuracy of fields through a check across data.
6. Publishing: analysts prepare the wrangled data for use downstream, whether by a
particular user or software and document any particular steps taken or logic used to
wrangle said data. Data wrangling gurus understand that implementation of insights relies
upon the ease with which it can be accessed and utilized by others.

b) briefly explain hierarchical indexing with example?

PART-C

16. a) explain about Numpy’s structured arrays?

Structured Data: NumPy’s Structured Arrays Imagine that we have several categories of data on a
number of people (say, name, age, and weight), and we’d like to store these values for use in a Python
program. It would be possible to store these in three separate arrays: In[2]: name = ['Alice', 'Bob',
'Cathy', 'Doug'] age = [25, 45, 37, 19] weight = [55.0, 85.5, 68.0, 61.5] But this is a bit clumsy. There’s
nothing here that tells us that the three arrays are related; it would be more natural if we could use a
single structure to store all of this data. NumPy can handle this through structured arrays, which are
arrays with com‐ pound data types. Recall that previously we created a simple array using an expression
like this: In[3]: x = np.zeros(4, dtype=int) We can similarly create a structured array using a compound
data type specification:
More Advanced Compound Types It is possible to define even more advanced compound types. For
example, you can create a type where each element contains an array or matrix of values. Here, we’ll
create a data type with a mat component consisting of a 3×3 floating-point matrix: In[14]: tp =
np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))]) X = np.zeros(1, dtype=tp) print(X[0]) print(X['mat'][0]) (0, [[0.0,
0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]) [[ 0. 0. 0.] [ 0. 0. 0.] [ 0. 0. 0.]] Now each element in the X array
consists of an id and a 3×3 matrix.

b) . Illustrate and manipulate Pandas Data frame object with an example program.

The Pandas DataFrame Object The fundamental structure in Pandas is the DataFrame. Like the Series
object, the DataFrame can be thought of either as a generalization of a NumPy array, or as a
specialization of a Python dictionary. DataFrame as a generalized NumPy array If a Series is an analog
of a one-dimensional array with flexible indices, a DataFrame is an analog of a two-dimensional array
with both flexible row indices and flexible column names. Just as you might think of a two-dimensional
array as an ordered sequence of aligned one-dimensional columns, you can think of a DataFrame as a
sequence of aligned Series objects. ie they share the same index. To demonstrate this, let’s first construct
a new Series listing the area of each of the five states discussed in the previous section: In[18]: area_dict
= {'California': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312, 'Illinois': 149995} area
= pd.Series(area_dict) area Out[18]: California 423967 Florida 170312 Illinois 149995 New York
141297 Texas 695662 dtype: int64 Now that we have this along with the population Series from before,
we can use a dictionary to construct a single two-dimensional object containing this information: In[19]:
states = pd.DataFrame({'population': population, 'area': area}) states Out[19]: area population California
423967 38332521 Florida 170312 19552860 Illinois 149995 12882135

New York 141297 19651127 Texas 695662 26448193 Like the Series object, the DataFrame has an
index attribute that gives access to the index labels: In[20]: states.index Out[20]: Index(['California',
'Florida', 'Illinois', 'New York', 'Texas'], dtype='object') Additionally, the DataFrame has a columns
attribute, which is an Index object holding the column labels: In[21]: states.columns Out[21]:
Index(['area', 'population'], dtype='object') Thus the DataFrame can be thought of as a generalization of a
two-dimensional NumPy array, where both the rows and columns have a generalized index for access
ing the data. DataFrame as specialized dictionary Similarly, we can also think of a DataFrame as a
specialization of a dictionary. Where a dictionary maps a key to a value, a DataFrame maps a column
name to a Series of column data. For example, asking for the 'area' attribute returns the Series object
containing the areas we saw earlier: In[22]: states['area'] Out[22]: California 423967 Florida 170312
Illinois 149995 New York 141297 Texas 695662 Name: area, dtype: int64 Constructing DataFrame
objects A Pandas DataFrame can be constructed in a variety of ways. From a single Series object. A
DataFrame is a collection of Series objects, and a single column DataFrame can be constructed from a
single Series: In[23]: pd.DataFrame(population, columns=['population']) Out[23]: population California
38332521 Florida 19552860

Florida 170312 19552860 Illinois 149995 12882135 New York 141297 19651127 Texas 695662
26448193 From a two-dimensional NumPy array. Given a two-dimensional array of data, we can create
a DataFrame with any specified column and index names. If omitted, an integer index will be used for
each: In[27]: pd.DataFrame(np.random.rand(3, 2), columns=['foo', 'bar'], index=['a', 'b', 'c']) Out[27]: foo
bar a 0.865257 0.213169 b 0.442759 0.108267 c 0.047110 0.905718 From a NumPy structured array.
We covered structured arrays in “Structured Data: NumPy’s Structured Arrays” on page 92. A Pandas
DataFrame operates much like a structured array, and can be created directly from one: In[28]: A =
np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')]) A Out[28]: array([(0, 0.0), (0, 0.0), (0, 0.0)],

Statistics in Nursing Research
94% (17)
Statistics in Nursing Research
33 pages
Formula Cheat Sheet
No ratings yet
Formula Cheat Sheet
1 page
Data Science Notes
No ratings yet
Data Science Notes
44 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
Report
No ratings yet
Report
18 pages
CS3352 FDS QB
No ratings yet
CS3352 FDS QB
6 pages
FDS II ANS KEY.pdf
No ratings yet
FDS II ANS KEY.pdf
50 pages
FDS IMPORTANT QUESTIONS EduEngg
100% (1)
FDS IMPORTANT QUESTIONS EduEngg
7 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Python CA2
No ratings yet
Python CA2
11 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
unit 5
No ratings yet
unit 5
28 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
MTE204 Data Python
No ratings yet
MTE204 Data Python
45 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
2 Mark Key DS
No ratings yet
2 Mark Key DS
3 pages
UNIT 1,2
No ratings yet
UNIT 1,2
17 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
jenisha INTERNSHIP REPORT-2.docx (1)
No ratings yet
jenisha INTERNSHIP REPORT-2.docx (1)
19 pages
UNIT 4 Data Science Notes
No ratings yet
UNIT 4 Data Science Notes
4 pages
Python Codes
No ratings yet
Python Codes
99 pages
Unit 1
100% (1)
Unit 1
69 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
data science
No ratings yet
data science
10 pages
Cs3352 Foundations of Data Science
No ratings yet
Cs3352 Foundations of Data Science
4 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
Data Analysis Using Python Day_1 to Day_4
No ratings yet
Data Analysis Using Python Day_1 to Day_4
30 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
DM_merged
No ratings yet
DM_merged
169 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Data Handling Using NumPy
No ratings yet
Data Handling Using NumPy
45 pages
UNIT 2
No ratings yet
UNIT 2
38 pages
IAT-II FDS-Answer Key
No ratings yet
IAT-II FDS-Answer Key
11 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
51 pages
Data Science
No ratings yet
Data Science
13 pages
QB for DS - V Sem Students
No ratings yet
QB for DS - V Sem Students
23 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
Exp1
No ratings yet
Exp1
3 pages
FDSA LAB MANUAL
No ratings yet
FDSA LAB MANUAL
53 pages
ds viva
No ratings yet
ds viva
9 pages
jjkjk
No ratings yet
jjkjk
10 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
3 - Pandas
No ratings yet
3 - Pandas
87 pages
ds with py
No ratings yet
ds with py
39 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Pythonfor Data Analysiskkkk
No ratings yet
Pythonfor Data Analysiskkkk
43 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Mcs1009 Ml Answer Key Part b 13 Marks
No ratings yet
Mcs1009 Ml Answer Key Part b 13 Marks
2 pages
Seminar Report
No ratings yet
Seminar Report
8 pages
Mcs1009 Ml Answer Key Part c
No ratings yet
Mcs1009 Ml Answer Key Part c
2 pages
Guest Lecture (Devi File)
No ratings yet
Guest Lecture (Devi File)
24 pages
Mdcs Lab Questions
No ratings yet
Mdcs Lab Questions
5 pages
Mcs1009 Ml Answer Key Part a and b
No ratings yet
Mcs1009 Ml Answer Key Part a and b
2 pages
IAT - QP Sample
No ratings yet
IAT - QP Sample
1 page
Faculty Journal Publication and Research Funding details
No ratings yet
Faculty Journal Publication and Research Funding details
2 pages
IEEE_COMST_2013_Cloud_Computing
No ratings yet
IEEE_COMST_2013_Cloud_Computing
18 pages
HADA
No ratings yet
HADA
2 pages
IOS Answer Key
No ratings yet
IOS Answer Key
34 pages
MINI PROJECT VILLAGE MANAGEMENT
No ratings yet
MINI PROJECT VILLAGE MANAGEMENT
2 pages
1694639964-Module 3 Azure Data Factory
No ratings yet
1694639964-Module 3 Azure Data Factory
48 pages
os
No ratings yet
os
6 pages
Confluence_of_Blockchain_and_Artificial_Intelligence_Technologies_for_Secure_and_Scalable_Healthcare_Solutions_A_Review
No ratings yet
Confluence_of_Blockchain_and_Artificial_Intelligence_Technologies_for_Secure_and_Scalable_Healthcare_Solutions_A_Review
25 pages
Drone Technology in Architecture Engineering and Construction a Strategic Guide to Unmanned
No ratings yet
Drone Technology in Architecture Engineering and Construction a Strategic Guide to Unmanned
179 pages
1694640037-Module 4 Azure Machine Learning for Predictive Analytics
No ratings yet
1694640037-Module 4 Azure Machine Learning for Predictive Analytics
32 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Applications of Machine Learning For Prediction of Liver Disease
No ratings yet
Applications of Machine Learning For Prediction of Liver Disease
3 pages
Pedigree VS Grit 6
0% (3)
Pedigree VS Grit 6
33 pages
Outlook Temp Humidity Windy Play
No ratings yet
Outlook Temp Humidity Windy Play
17 pages
Andrew NG
No ratings yet
Andrew NG
31 pages
Fachris Statistika Quiz 2
No ratings yet
Fachris Statistika Quiz 2
12 pages
Slide-Co Minh NT
No ratings yet
Slide-Co Minh NT
162 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Contrast Analysis: Focused Comparisons in The Analysis of Variance
No ratings yet
Contrast Analysis: Focused Comparisons in The Analysis of Variance
10 pages
Roles of Statistics in Research
100% (2)
Roles of Statistics in Research
18 pages
Good Psychometric Properties
No ratings yet
Good Psychometric Properties
44 pages
MTH281 Final Samples and Notes
No ratings yet
MTH281 Final Samples and Notes
159 pages
Introduction Statistical Learning
No ratings yet
Introduction Statistical Learning
39 pages
Arima Modelling and Diagnostic Test
No ratings yet
Arima Modelling and Diagnostic Test
2 pages
Problem 1 and 2: Elijah Arden C. Ong
No ratings yet
Problem 1 and 2: Elijah Arden C. Ong
13 pages
21103-59297-1-PB
No ratings yet
21103-59297-1-PB
7 pages
Decision Trees- Id3 Algorithms
No ratings yet
Decision Trees- Id3 Algorithms
12 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
QA Syllabus
No ratings yet
QA Syllabus
9 pages
Correlation and Scatter Graphs
No ratings yet
Correlation and Scatter Graphs
7 pages
Core08 M9
No ratings yet
Core08 M9
9 pages
Unit II - 1 - Chapter 4 - Training Models
No ratings yet
Unit II - 1 - Chapter 4 - Training Models
20 pages
Statistic Joke
No ratings yet
Statistic Joke
5 pages
Book 111
No ratings yet
Book 111
3 pages
Repeated Measures ANOVA and Two-Factor (Factorial) ANOVA
No ratings yet
Repeated Measures ANOVA and Two-Factor (Factorial) ANOVA
32 pages
Module 2 - Lect 5 - Forecasting PDF
No ratings yet
Module 2 - Lect 5 - Forecasting PDF
36 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
QUA2311 Assignment - 6
No ratings yet
QUA2311 Assignment - 6
6 pages
3Is Q4 Complete Notes
No ratings yet
3Is Q4 Complete Notes
20 pages