0% found this document useful (0 votes)

6 views

122469

The document outlines the curriculum for IT1402, a course on Fundamentals of Data Science and Analytics, covering topics such as data science processes, descriptive analytics, inferential statistics, analysis of variance, and predictive analytics. It includes both theoretical concepts and practical applications, with a focus on data retrieval, cleansing, modeling, and visualization techniques. The course also addresses statistical tests, regression analysis, and ethical considerations in data science.

Uploaded by

raja sp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

122469

Uploaded by

raja sp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

IT1402/ Fundamentals of Data Science and Analytics Dept.

of 2024-
IT 2025

UNIT I INTRODUCTION TO DATA SCIENCE

Need for data science – benefits and uses – facets of data – data science process
– setting the research goal – retrieving data – cleansing, integrating, and
transforming data – exploratory data analysis – build the models – presenting and
building applications.

PART A

1. Define data science.

2. What are the key benefits of data science?
3. List any two uses of data science in the healthcare industry.
4. What are the three facets of data?
5. Define structured data with an example.
6. Differentiate between structured and unstructured data.
7. What is the significance of setting a research goal in data science?
8. Mention two common sources of data retrieval.
9. What is data cleansing?
10.Define data integration with an example.
11.What is the purpose of data transformation?
12.Name two common techniques used in exploratory data analysis (EDA).
13.What is meant by outlier detection in EDA?
14.Define data visualization with an example.
15.What is a predictive model?
16.Name two commonly used algorithms in data science modeling.
17.What is meant by overfitting in model building?
18.What is underfitting in machine learning?
19.Mention two benefits of presenting data through visualization tools.
20.What is a dashboard in the context of data science?
21.Define the term "big data."
22.List any two tools used for data cleansing.
23.What is meant by exploratory analysis?
24.Define the term "feature engineering."
25.What is meant by a training dataset?
26.Define a test dataset.
27.What is the role of Python in data science?
28.Name two popular libraries in Python for data visualization.
29.What is meant by integrating data from multiple sources?
30.Mention two tools used for building data science applications.

PART- B
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
1. Explain the need for data science in various industries with examples.
2. Discuss the key benefits and applications of data science in healthcare and
retail.
3. Elaborate on the facets of data and their importance in data science projects.
4. Explain the data science process in detail with a real-world example.
5. Describe the steps involved in setting a research goal for a data science
project.
6. Explain the process of data retrieval from structured and unstructured
sources.
7. Discuss the significance of data cleansing, integration, and transformation
with examples.
8. Explain the methods used in exploratory data analysis and their importance.
9. Discuss the concept of outlier detection and its impact on data analysis.
10.Explain the steps involved in building predictive and prescriptive models in
data science.
11.Discuss the challenges in presenting data to non-technical stakeholders and
their solutions.
12.Explain the role of dashboards and visualization tools in building data science
applications.
13.Discuss common challenges faced during data cleansing and integration and
how to overcome them.
14.Elaborate on the importance of EDA in understanding datasets with
examples.
15.Discuss the ethical considerations in data science projects, including bias and
privacy concerns.

UNIT II DESCRIPTIVE ANALYTICS

Frequency distributions – Outliers –interpreting distributions – graphs – averages -
describing variability – interquartile range – variability for qualitative and ranked
data- Normal distributions – z scores –correlation – scatter plots – regression –
regression line – least squares regression line – standard error of estimate –
interpretation of r2 – multiple regression equations – regression toward the mean.

PART A
1. Define frequency distribution.
2. What are outliers in a dataset?
3. Name two methods to detect outliers in a dataset.
4. What is a histogram used for?
5. Differentiate between bar graphs and histograms.
6. Define mean as a measure of central tendency.
7. What is the median of a dataset?
8. Define mode with an example.
9. What is the importance of describing variability in data?
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
10.Define the range of a dataset.
11.What is the interquartile range (IQR)?
12.How is variability calculated for qualitative data?
13.What is a normal distribution?
14.State the properties of a normal distribution.
15.Define a z-score in statistics.
16.What is the significance of z-scores in a normal distribution?
17.Define correlation in the context of statistics.
18.Differentiate between positive and negative correlation.
19.What is a scatter plot?
20.Define a regression line.
21.What is the least squares regression line?
22.What is meant by the standard error of estimate?
23.Define r2r^2r2 (coefficient of determination).
24.What does r2=1r^2 = 1r2=1 signify?
25.What is the purpose of multiple regression equations?
26.Explain regression toward the mean in simple terms.
27.Define the term "dependent variable" in regression analysis.
28.What is an independent variable in regression?
29.State one application of regression in real-world scenarios.
30.What is the relationship between correlation and regression?

PART B
1. Explain frequency distributions and their importance in data analysis with
examples.
2. Discuss various methods to identify and handle outliers in datasets.
3. Explain different types of graphs used to interpret distributions with
examples.
4. Discuss the measures of central tendency (mean, median, and mode) with
their merits and demerits.
5. Describe the concepts of variability and interquartile range with examples.
6. Explain how variability is measured for qualitative and ranked data.
7. Elaborate on the properties of a normal distribution and its real-world
applications.
8. Explain the concept of z-scores and their role in standardizing datasets.
9. Discuss the significance of correlation and scatter plots in statistical analysis.
10.Explain the steps to derive a regression line using the least squares method.
11.Discuss the importance of the standard error of estimate in regression
analysis.
12.Explain the interpretation and importance of r2r^2r2 in assessing model
performance.
13.Describe the process of building and interpreting multiple regression
equations with examples.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
14.Explain the concept of regression toward the mean and its practical
implications.
15.Compare and contrast correlation and regression, highlighting their
differences and applications.

UNIT III INFERENTIAL STATISTICS

Populations – samples – random sampling – Sampling distribution- standard error
of the mean - Hypothesis testing – z-test – z-test procedure –decision rule –
calculations – decisions – interpretations - one-tailed and two-tailed tests –
Estimation – point estimate – confidence interval – level of confidence – effect of
sample size.

PART A

1. Define population in statistics.

2. What is a sample?
3. Differentiate between population and sample.
4. What is random sampling?
5. State one advantage of random sampling.
6. Define sampling distribution.
7. What is the significance of a sampling distribution in statistics?
8. Define the standard error of the mean.
9. How is the standard error of the mean calculated?
10.What is the purpose of hypothesis testing?
11.Define a null hypothesis.
12.What is an alternative hypothesis?
13.What is a z-test?
14.State one assumption for conducting a z-test.
15.What is the decision rule in hypothesis testing?
16.Define a one-tailed test.
17.What is a two-tailed test?
18.Differentiate between one-tailed and two-tailed tests.
19.What is a point estimate?
20.Define confidence interval.
21.What does a 95% confidence interval imply?
22.What is meant by the level of confidence in estimation?
23.How does sample size affect the standard error of the mean?
24.State the relationship between sample size and confidence interval width.
25.Define the critical value in hypothesis testing.
26.What is the significance level (α\alphaα) in hypothesis testing?
27.Name two common errors in hypothesis testing.
28.What is a Type I error?
29.Define a Type II error.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
30.What is the relationship between hypothesis testing and decision-making?

PART B

1. Explain the concepts of population and sample with examples and their
importance in statistics.
2. Discuss the principles and advantages of random sampling in data collection.
3. Elaborate on sampling distribution and its role in inferential statistics with
examples.
4. Explain the standard error of the mean and its significance in hypothesis
testing.
5. Discuss the steps involved in hypothesis testing with an example.
6. Describe the z-test procedure, including its assumptions, calculations, and
interpretations.
7. Compare one-tailed and two-tailed tests with examples and their
applications.
8. Explain the decision rule in hypothesis testing and its importance in making
statistical inferences.
9. Discuss the concept of estimation, focusing on point estimates and
confidence intervals.
10.Elaborate on the construction and interpretation of confidence intervals with
real-world examples.
11.Explain how sample size affects the standard error, confidence intervals, and
hypothesis testing outcomes.
12.Discuss the role of the level of confidence in hypothesis testing and
estimation.
13.Explain the calculation and interpretation of critical values in hypothesis
testing.
14.Discuss the types of errors in hypothesis testing (Type I and Type II) and their
implications.
15.Compare hypothesis testing and confidence interval approaches for statistical
inference.

UNIT IV ANALYSIS OF VARIANCE

t-test for one sample – sampling distribution of t – t-test procedure – t-test for two
independent samples – p-value – statistical significance – t-test for two related
samples. F-test – ANOVA – Two- factor experiments – three f-tests – two-factor
ANOVA –Introduction to chi-square tests.

PART A
1. What is a t-test used for in statistics?
2. Define the t-test for one sample.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
3. What is the sampling distribution of t?
4. What are the assumptions for performing a t-test?
5. What is the procedure for conducting a one-sample t-test?
6. Define the t-test for two independent samples.
7. When is the t-test for two independent samples used?
8. What is the t-test for two related samples?
9. Define the term "p-value" in hypothesis testing.
10.What does statistical significance mean?
11.What is the difference between practical significance and statistical
significance?
12.State one application of the F-test.
13.Define ANOVA (Analysis of Variance).
14.When is a one-way ANOVA used?
15.What is meant by a two-factor experiment?
16.Define the term "interaction effect" in two-factor ANOVA.
17.What are the three F-tests in ANOVA?
18.Explain the concept of between-group variance in ANOVA.
19.What is within-group variance in ANOVA?
20.What is the null hypothesis in an F-test?
21.Define the term "two-factor ANOVA."
22.What is the purpose of a chi-square test?
23.When is a chi-square goodness-of-fit test used?
24.What is the chi-square test of independence?
25.What are the assumptions for a chi-square test?
26.State one difference between a t-test and an F-test.
27.What is the relationship between ANOVA and the F-test?
28.Define critical value in the context of ANOVA.
29.What does a significant interaction in two-factor ANOVA indicate?
30.Name two statistical tests that compare means between groups.

PART B
1. Explain the t-test for one sample, including its assumptions, procedure,
and interpretation.
2. Discuss the concept of the sampling distribution of t and its importance in
hypothesis testing.
3. Explain the procedure for conducting a t-test for two independent samples
with an example.
4. Discuss the application of the p-value in hypothesis testing and its
interpretation.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
5. Explain the concept of statistical significance and its implications in real-
world scenarios.
6. Describe the t-test for two related samples with examples and its use
cases.
7. Explain the F-test, its assumptions, and its relationship with ANOVA.
8. Discuss the principles and applications of one-way ANOVA with an
example.
9. Elaborate on two-factor experiments and the role of ANOVA in analyzing
such experiments.
10. Discuss the three F-tests in ANOVA and their significance.
11. Explain the concept of two-factor ANOVA, including main effects and
interaction effects.
12. Discuss the role of ANOVA in comparing multiple group means and its
advantages over t-tests.
13. Explain the chi-square test for independence, including assumptions,
procedure, and interpretation.
14. Compare and contrast the t-test, F-test, and chi-square test, highlighting
their applications.
15. Discuss the role of statistical tests in experimental design and data
analysis, with examples of t-tests, ANOVA, and chi-square tests.

UNIT V PREDICTIVE ANALYTICS

Linear least squares- implementation – goodness of fit – testing a linear
model – weighted resampling. Regression using Stats Models – multiple
regression – nonlinear relationships – logistic regression – estimating
parameters – Time series analysis – moving averages – missing values –
serial correlation – autocorrelation. Introduction to survival analysis.

PART A

1. Define linear least squares in regression analysis.

2. What is the purpose of the linear least squares method?
3. What is meant by the goodness of fit in a regression model?
4. How is the coefficient of determination (r2r^2r2) used to measure goodness of
fit?
5. Define residuals in the context of linear regression.
6. What is the purpose of testing a linear model?
7. What is weighted resampling in data analysis?
8. Define multiple regression.
9. When is multiple regression used?
10.What is a nonlinear relationship in regression analysis?
11.Define logistic regression.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
12.What type of dependent variable is required for logistic regression?
13.What is the logit function in logistic regression?
14.Explain the term "estimating parameters" in regression.
15.What is the main objective of time series analysis?
16.Define a moving average in time series analysis.
17.What is the purpose of using moving averages in data analysis?
18.Define serial correlation in time series data.
19.What is autocorrelation, and how is it different from serial correlation?
20.How does missing data impact time series analysis?
21.What is survival analysis?
22.Define the term "hazard function" in survival analysis.
23.What is a Kaplan-Meier estimator?
24.How does a Cox proportional hazards model work in survival analysis?
25.State one application of logistic regression.
26.What is meant by overfitting in regression analysis?
27.Define the term "independent variable" in regression.
28.State one assumption of linear regression.
29.What is the relationship between autocorrelation and lag in time series?
30.What does the Akaike Information Criterion (AIC) measure in model selection?

PART B
1. Explain the linear least squares method and its implementation with
examples.
2. Discuss the importance of goodness of fit in evaluating regression models.
3. Describe the process of testing a linear model and its significance in
regression analysis.
4. Explain weighted resampling and its role in handling unbalanced datasets.
5. Discuss the principles and applications of multiple regression with an
example.
6. Explain how nonlinear relationships are modeled and their significance in
regression analysis.
7. Describe logistic regression, including the logit function and its applications.
8. Discuss the process of estimating parameters in regression analysis and its
importance.
9. Explain the components and significance of time series analysis with
examples.
10.Discuss moving averages and their role in time series analysis, including
examples.
11.Explain the concept of autocorrelation and its role in analyzing time series
data.
12.Discuss the impact of missing values on time series analysis and methods to
handle them.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
13.Provide an overview of survival analysis and its key concepts, including
applications.
14.Compare and contrast linear regression, logistic regression, and survival
analysis.
15.Explain the Cox proportional hazards model and its significance in survival
analysis.

CSBS - AD3491 - FDSA - IA 1 - Answer Key
100% (11)
CSBS - AD3491 - FDSA - IA 1 - Answer Key
14 pages
Fds UNIT 1
No ratings yet
Fds UNIT 1
38 pages
FDSA unit 1
No ratings yet
FDSA unit 1
34 pages
AD3491 FDSA Syllabus
No ratings yet
AD3491 FDSA Syllabus
2 pages
CS3552_FODS_QB 2024
No ratings yet
CS3552_FODS_QB 2024
11 pages
FDSA - Question Bank
No ratings yet
FDSA - Question Bank
5 pages
COURSE PLAN - FDS THEORY (1)
No ratings yet
COURSE PLAN - FDS THEORY (1)
7 pages
FDS SYLLABUS AIDS
No ratings yet
FDS SYLLABUS AIDS
2 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
fds-two-marks
No ratings yet
fds-two-marks
10 pages
question-bank
No ratings yet
question-bank
7 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Unit Ii-Ds
No ratings yet
Unit Ii-Ds
12 pages
Question Bank
No ratings yet
Question Bank
7 pages
Important Part B and Part C Questions1
No ratings yet
Important Part B and Part C Questions1
4 pages
IDS Syllabus
No ratings yet
IDS Syllabus
3 pages
CS3352-QB
No ratings yet
CS3352-QB
15 pages
Updated Cs3352 - Foundations of Data Science - Duraimurugan
No ratings yet
Updated Cs3352 - Foundations of Data Science - Duraimurugan
16 pages
FDS 2 Marks 50 Questions
No ratings yet
FDS 2 Marks 50 Questions
2 pages
DSA Module 1 Notes
No ratings yet
DSA Module 1 Notes
24 pages
DS - Question paper
No ratings yet
DS - Question paper
3 pages
Padeepz App AD3491 Syllabus
No ratings yet
Padeepz App AD3491 Syllabus
2 pages
Data Science 3
No ratings yet
Data Science 3
216 pages
Data Science -Model Exam Question paper
No ratings yet
Data Science -Model Exam Question paper
2 pages
Ass-3 Ds
No ratings yet
Ass-3 Ds
7 pages
ad3491-foda-question-bank
No ratings yet
ad3491-foda-question-bank
7 pages
Fdsa Question-Bank
No ratings yet
Fdsa Question-Bank
7 pages
FDS IMPORTANT QUESTIONS EduEngg
100% (1)
FDS IMPORTANT QUESTIONS EduEngg
7 pages
3.Question bank
No ratings yet
3.Question bank
7 pages
Data Science Assignment
No ratings yet
Data Science Assignment
9 pages
Edit Ds
No ratings yet
Edit Ds
37 pages
Module-2
No ratings yet
Module-2
83 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
139 pages
Priority Questions
No ratings yet
Priority Questions
12 pages
Foundations of Data Science.docx
No ratings yet
Foundations of Data Science.docx
3 pages
Syllabus FDS
No ratings yet
Syllabus FDS
4 pages
Question Bank FDS
No ratings yet
Question Bank FDS
4 pages
FDS IAT 1 QUESTION with answer
No ratings yet
FDS IAT 1 QUESTION with answer
6 pages
Statistics in Data Science Interview Questions
No ratings yet
Statistics in Data Science Interview Questions
2 pages
Fods Cia 2
No ratings yet
Fods Cia 2
2 pages
SYLLABUS
No ratings yet
SYLLABUS
1 page
Unit 1 Fod
No ratings yet
Unit 1 Fod
43 pages
Question Bank
No ratings yet
Question Bank
7 pages
UNIT 1,2
No ratings yet
UNIT 1,2
17 pages
ESE-Theory Question -bank
No ratings yet
ESE-Theory Question -bank
6 pages
Classx DS Student Handbook
No ratings yet
Classx DS Student Handbook
60 pages
Data Ana With R
No ratings yet
Data Ana With R
45 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
Data Analytics With Python
No ratings yet
Data Analytics With Python
1,254 pages
U23AD492 - Data Science Syllabus
No ratings yet
U23AD492 - Data Science Syllabus
4 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
FDS 2 Marks All Units For File
No ratings yet
FDS 2 Marks All Units For File
13 pages
Fdsa Unit 2
No ratings yet
Fdsa Unit 2
89 pages
DS TANSCHE 03.06.2024
No ratings yet
DS TANSCHE 03.06.2024
23 pages
Data Science 5
100% (3)
Data Science 5
216 pages
DSC-203 Data Analytics Essentials
No ratings yet
DSC-203 Data Analytics Essentials
8 pages
ADS IA 1 syllabus prep (1)
No ratings yet
ADS IA 1 syllabus prep (1)
5 pages
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Ankit Bansal-CGT19005
No ratings yet
Ankit Bansal-CGT19005
7 pages
Predicting Pregnancies of Our Customers I - Regression Model
No ratings yet
Predicting Pregnancies of Our Customers I - Regression Model
50 pages
Pilgrim Case Solution
No ratings yet
Pilgrim Case Solution
12 pages
Complete SAS Code For Workout
No ratings yet
Complete SAS Code For Workout
52 pages
Pearsonstable PDF
No ratings yet
Pearsonstable PDF
1 page
27 +Putu+Indah+Sulistyari
No ratings yet
27 +Putu+Indah+Sulistyari
12 pages
Econ3150 - 4150 2018v Utsat Sensorveiledning
No ratings yet
Econ3150 - 4150 2018v Utsat Sensorveiledning
10 pages
Real Estate Modelling and Forecasting Chris Brooks - The ebook in PDF and DOCX formats is ready for download
100% (1)
Real Estate Modelling and Forecasting Chris Brooks - The ebook in PDF and DOCX formats is ready for download
46 pages
Exercise 5
No ratings yet
Exercise 5
8 pages
KMO Test Indicates The Proportion of Variance in Our Variable That Might Be Caused by The Underlining Factors Values Close To 1
No ratings yet
KMO Test Indicates The Proportion of Variance in Our Variable That Might Be Caused by The Underlining Factors Values Close To 1
3 pages
F Test
No ratings yet
F Test
19 pages
Econometric Modeling: Model Specification and Diagnostic Testing
No ratings yet
Econometric Modeling: Model Specification and Diagnostic Testing
52 pages
Normal Distribution and Regression Notes
No ratings yet
Normal Distribution and Regression Notes
71 pages
Econometrics Mock Exam - Solutions
No ratings yet
Econometrics Mock Exam - Solutions
3 pages
Chapter 1: Simple Regression Analysis
No ratings yet
Chapter 1: Simple Regression Analysis
12 pages
Manias Panics and Crashes A History of Financial Crises 7th download pdf
No ratings yet
Manias Panics and Crashes A History of Financial Crises 7th download pdf
81 pages
For Dummy Variables
No ratings yet
For Dummy Variables
13 pages
50 Data Science Interview Questions - by Hany Hossny, PHD - Medium
No ratings yet
50 Data Science Interview Questions - by Hany Hossny, PHD - Medium
5 pages
Thank You For Taking The Week 3: Assignment 3. Week 3: Assignment 3
No ratings yet
Thank You For Taking The Week 3: Assignment 3. Week 3: Assignment 3
3 pages
Jamovi Application
No ratings yet
Jamovi Application
1 page
Fake Currency Prediction. Project report
No ratings yet
Fake Currency Prediction. Project report
19 pages
T3 Bda
No ratings yet
T3 Bda
27 pages
Machine Learning Advances For Time Series Forecasting: Ricardo P. Masini
No ratings yet
Machine Learning Advances For Time Series Forecasting: Ricardo P. Masini
44 pages
Interview Questions - Linear Regression
No ratings yet
Interview Questions - Linear Regression
6 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Regression Analysis: Interpretation of Regression Model
No ratings yet
Regression Analysis: Interpretation of Regression Model
22 pages
Lesson 08 Data Analysis Using Statistics
No ratings yet
Lesson 08 Data Analysis Using Statistics
100 pages
Assignment Solutions 5
No ratings yet
Assignment Solutions 5
3 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Bstat 3322 Test Study Guide
No ratings yet
Bstat 3322 Test Study Guide
8 pages

122469

Uploaded by

122469

Uploaded by

IT1402/ Fundamentals of Data Science and Analytics Dept.

UNIT I INTRODUCTION TO DATA SCIENCE

1. Define data science.

UNIT II DESCRIPTIVE ANALYTICS

UNIT III INFERENTIAL STATISTICS

1. Define population in statistics.

UNIT IV ANALYSIS OF VARIANCE

UNIT V PREDICTIVE ANALYTICS

1. Define linear least squares in regression analysis.

You might also like