0% found this document useful (0 votes)
6 views

122469

The document outlines the curriculum for IT1402, a course on Fundamentals of Data Science and Analytics, covering topics such as data science processes, descriptive analytics, inferential statistics, analysis of variance, and predictive analytics. It includes both theoretical concepts and practical applications, with a focus on data retrieval, cleansing, modeling, and visualization techniques. The course also addresses statistical tests, regression analysis, and ethical considerations in data science.

Uploaded by

raja sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

122469

The document outlines the curriculum for IT1402, a course on Fundamentals of Data Science and Analytics, covering topics such as data science processes, descriptive analytics, inferential statistics, analysis of variance, and predictive analytics. It includes both theoretical concepts and practical applications, with a focus on data retrieval, cleansing, modeling, and visualization techniques. The course also addresses statistical tests, regression analysis, and ethical considerations in data science.

Uploaded by

raja sp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

IT1402/ Fundamentals of Data Science and Analytics Dept.

of 2024-
IT 2025

UNIT I INTRODUCTION TO DATA SCIENCE

Need for data science – benefits and uses – facets of data – data science process
– setting the research goal – retrieving data – cleansing, integrating, and
transforming data – exploratory data analysis – build the models – presenting and
building applications.

PART A

1. Define data science.


2. What are the key benefits of data science?
3. List any two uses of data science in the healthcare industry.
4. What are the three facets of data?
5. Define structured data with an example.
6. Differentiate between structured and unstructured data.
7. What is the significance of setting a research goal in data science?
8. Mention two common sources of data retrieval.
9. What is data cleansing?
10.Define data integration with an example.
11.What is the purpose of data transformation?
12.Name two common techniques used in exploratory data analysis (EDA).
13.What is meant by outlier detection in EDA?
14.Define data visualization with an example.
15.What is a predictive model?
16.Name two commonly used algorithms in data science modeling.
17.What is meant by overfitting in model building?
18.What is underfitting in machine learning?
19.Mention two benefits of presenting data through visualization tools.
20.What is a dashboard in the context of data science?
21.Define the term "big data."
22.List any two tools used for data cleansing.
23.What is meant by exploratory analysis?
24.Define the term "feature engineering."
25.What is meant by a training dataset?
26.Define a test dataset.
27.What is the role of Python in data science?
28.Name two popular libraries in Python for data visualization.
29.What is meant by integrating data from multiple sources?
30.Mention two tools used for building data science applications.

PART- B
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
1. Explain the need for data science in various industries with examples.
2. Discuss the key benefits and applications of data science in healthcare and
retail.
3. Elaborate on the facets of data and their importance in data science projects.
4. Explain the data science process in detail with a real-world example.
5. Describe the steps involved in setting a research goal for a data science
project.
6. Explain the process of data retrieval from structured and unstructured
sources.
7. Discuss the significance of data cleansing, integration, and transformation
with examples.
8. Explain the methods used in exploratory data analysis and their importance.
9. Discuss the concept of outlier detection and its impact on data analysis.
10.Explain the steps involved in building predictive and prescriptive models in
data science.
11.Discuss the challenges in presenting data to non-technical stakeholders and
their solutions.
12.Explain the role of dashboards and visualization tools in building data science
applications.
13.Discuss common challenges faced during data cleansing and integration and
how to overcome them.
14.Elaborate on the importance of EDA in understanding datasets with
examples.
15.Discuss the ethical considerations in data science projects, including bias and
privacy concerns.

UNIT II DESCRIPTIVE ANALYTICS


Frequency distributions – Outliers –interpreting distributions – graphs – averages -
describing variability – interquartile range – variability for qualitative and ranked
data- Normal distributions – z scores –correlation – scatter plots – regression –
regression line – least squares regression line – standard error of estimate –
interpretation of r2 – multiple regression equations – regression toward the mean.

PART A
1. Define frequency distribution.
2. What are outliers in a dataset?
3. Name two methods to detect outliers in a dataset.
4. What is a histogram used for?
5. Differentiate between bar graphs and histograms.
6. Define mean as a measure of central tendency.
7. What is the median of a dataset?
8. Define mode with an example.
9. What is the importance of describing variability in data?
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
10.Define the range of a dataset.
11.What is the interquartile range (IQR)?
12.How is variability calculated for qualitative data?
13.What is a normal distribution?
14.State the properties of a normal distribution.
15.Define a z-score in statistics.
16.What is the significance of z-scores in a normal distribution?
17.Define correlation in the context of statistics.
18.Differentiate between positive and negative correlation.
19.What is a scatter plot?
20.Define a regression line.
21.What is the least squares regression line?
22.What is meant by the standard error of estimate?
23.Define r2r^2r2 (coefficient of determination).
24.What does r2=1r^2 = 1r2=1 signify?
25.What is the purpose of multiple regression equations?
26.Explain regression toward the mean in simple terms.
27.Define the term "dependent variable" in regression analysis.
28.What is an independent variable in regression?
29.State one application of regression in real-world scenarios.
30.What is the relationship between correlation and regression?

PART B
1. Explain frequency distributions and their importance in data analysis with
examples.
2. Discuss various methods to identify and handle outliers in datasets.
3. Explain different types of graphs used to interpret distributions with
examples.
4. Discuss the measures of central tendency (mean, median, and mode) with
their merits and demerits.
5. Describe the concepts of variability and interquartile range with examples.
6. Explain how variability is measured for qualitative and ranked data.
7. Elaborate on the properties of a normal distribution and its real-world
applications.
8. Explain the concept of z-scores and their role in standardizing datasets.
9. Discuss the significance of correlation and scatter plots in statistical analysis.
10.Explain the steps to derive a regression line using the least squares method.
11.Discuss the importance of the standard error of estimate in regression
analysis.
12.Explain the interpretation and importance of r2r^2r2 in assessing model
performance.
13.Describe the process of building and interpreting multiple regression
equations with examples.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
14.Explain the concept of regression toward the mean and its practical
implications.
15.Compare and contrast correlation and regression, highlighting their
differences and applications.

UNIT III INFERENTIAL STATISTICS


Populations – samples – random sampling – Sampling distribution- standard error
of the mean - Hypothesis testing – z-test – z-test procedure –decision rule –
calculations – decisions – interpretations - one-tailed and two-tailed tests –
Estimation – point estimate – confidence interval – level of confidence – effect of
sample size.

PART A

1. Define population in statistics.


2. What is a sample?
3. Differentiate between population and sample.
4. What is random sampling?
5. State one advantage of random sampling.
6. Define sampling distribution.
7. What is the significance of a sampling distribution in statistics?
8. Define the standard error of the mean.
9. How is the standard error of the mean calculated?
10.What is the purpose of hypothesis testing?
11.Define a null hypothesis.
12.What is an alternative hypothesis?
13.What is a z-test?
14.State one assumption for conducting a z-test.
15.What is the decision rule in hypothesis testing?
16.Define a one-tailed test.
17.What is a two-tailed test?
18.Differentiate between one-tailed and two-tailed tests.
19.What is a point estimate?
20.Define confidence interval.
21.What does a 95% confidence interval imply?
22.What is meant by the level of confidence in estimation?
23.How does sample size affect the standard error of the mean?
24.State the relationship between sample size and confidence interval width.
25.Define the critical value in hypothesis testing.
26.What is the significance level (α\alphaα) in hypothesis testing?
27.Name two common errors in hypothesis testing.
28.What is a Type I error?
29.Define a Type II error.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
30.What is the relationship between hypothesis testing and decision-making?

PART B

1. Explain the concepts of population and sample with examples and their
importance in statistics.
2. Discuss the principles and advantages of random sampling in data collection.
3. Elaborate on sampling distribution and its role in inferential statistics with
examples.
4. Explain the standard error of the mean and its significance in hypothesis
testing.
5. Discuss the steps involved in hypothesis testing with an example.
6. Describe the z-test procedure, including its assumptions, calculations, and
interpretations.
7. Compare one-tailed and two-tailed tests with examples and their
applications.
8. Explain the decision rule in hypothesis testing and its importance in making
statistical inferences.
9. Discuss the concept of estimation, focusing on point estimates and
confidence intervals.
10.Elaborate on the construction and interpretation of confidence intervals with
real-world examples.
11.Explain how sample size affects the standard error, confidence intervals, and
hypothesis testing outcomes.
12.Discuss the role of the level of confidence in hypothesis testing and
estimation.
13.Explain the calculation and interpretation of critical values in hypothesis
testing.
14.Discuss the types of errors in hypothesis testing (Type I and Type II) and their
implications.
15.Compare hypothesis testing and confidence interval approaches for statistical
inference.

UNIT IV ANALYSIS OF VARIANCE


t-test for one sample – sampling distribution of t – t-test procedure – t-test for two
independent samples – p-value – statistical significance – t-test for two related
samples. F-test – ANOVA – Two- factor experiments – three f-tests – two-factor
ANOVA –Introduction to chi-square tests.

PART A
1. What is a t-test used for in statistics?
2. Define the t-test for one sample.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
3. What is the sampling distribution of t?
4. What are the assumptions for performing a t-test?
5. What is the procedure for conducting a one-sample t-test?
6. Define the t-test for two independent samples.
7. When is the t-test for two independent samples used?
8. What is the t-test for two related samples?
9. Define the term "p-value" in hypothesis testing.
10.What does statistical significance mean?
11.What is the difference between practical significance and statistical
significance?
12.State one application of the F-test.
13.Define ANOVA (Analysis of Variance).
14.When is a one-way ANOVA used?
15.What is meant by a two-factor experiment?
16.Define the term "interaction effect" in two-factor ANOVA.
17.What are the three F-tests in ANOVA?
18.Explain the concept of between-group variance in ANOVA.
19.What is within-group variance in ANOVA?
20.What is the null hypothesis in an F-test?
21.Define the term "two-factor ANOVA."
22.What is the purpose of a chi-square test?
23.When is a chi-square goodness-of-fit test used?
24.What is the chi-square test of independence?
25.What are the assumptions for a chi-square test?
26.State one difference between a t-test and an F-test.
27.What is the relationship between ANOVA and the F-test?
28.Define critical value in the context of ANOVA.
29.What does a significant interaction in two-factor ANOVA indicate?
30.Name two statistical tests that compare means between groups.

PART B
1. Explain the t-test for one sample, including its assumptions, procedure,
and interpretation.
2. Discuss the concept of the sampling distribution of t and its importance in
hypothesis testing.
3. Explain the procedure for conducting a t-test for two independent samples
with an example.
4. Discuss the application of the p-value in hypothesis testing and its
interpretation.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
5. Explain the concept of statistical significance and its implications in real-
world scenarios.
6. Describe the t-test for two related samples with examples and its use
cases.
7. Explain the F-test, its assumptions, and its relationship with ANOVA.
8. Discuss the principles and applications of one-way ANOVA with an
example.
9. Elaborate on two-factor experiments and the role of ANOVA in analyzing
such experiments.
10. Discuss the three F-tests in ANOVA and their significance.
11. Explain the concept of two-factor ANOVA, including main effects and
interaction effects.
12. Discuss the role of ANOVA in comparing multiple group means and its
advantages over t-tests.
13. Explain the chi-square test for independence, including assumptions,
procedure, and interpretation.
14. Compare and contrast the t-test, F-test, and chi-square test, highlighting
their applications.
15. Discuss the role of statistical tests in experimental design and data
analysis, with examples of t-tests, ANOVA, and chi-square tests.

UNIT V PREDICTIVE ANALYTICS


Linear least squares- implementation – goodness of fit – testing a linear
model – weighted resampling. Regression using Stats Models – multiple
regression – nonlinear relationships – logistic regression – estimating
parameters – Time series analysis – moving averages – missing values –
serial correlation – autocorrelation. Introduction to survival analysis.

PART A

1. Define linear least squares in regression analysis.


2. What is the purpose of the linear least squares method?
3. What is meant by the goodness of fit in a regression model?
4. How is the coefficient of determination (r2r^2r2) used to measure goodness of
fit?
5. Define residuals in the context of linear regression.
6. What is the purpose of testing a linear model?
7. What is weighted resampling in data analysis?
8. Define multiple regression.
9. When is multiple regression used?
10.What is a nonlinear relationship in regression analysis?
11.Define logistic regression.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
12.What type of dependent variable is required for logistic regression?
13.What is the logit function in logistic regression?
14.Explain the term "estimating parameters" in regression.
15.What is the main objective of time series analysis?
16.Define a moving average in time series analysis.
17.What is the purpose of using moving averages in data analysis?
18.Define serial correlation in time series data.
19.What is autocorrelation, and how is it different from serial correlation?
20.How does missing data impact time series analysis?
21.What is survival analysis?
22.Define the term "hazard function" in survival analysis.
23.What is a Kaplan-Meier estimator?
24.How does a Cox proportional hazards model work in survival analysis?
25.State one application of logistic regression.
26.What is meant by overfitting in regression analysis?
27.Define the term "independent variable" in regression.
28.State one assumption of linear regression.
29.What is the relationship between autocorrelation and lag in time series?
30.What does the Akaike Information Criterion (AIC) measure in model selection?

PART B
1. Explain the linear least squares method and its implementation with
examples.
2. Discuss the importance of goodness of fit in evaluating regression models.
3. Describe the process of testing a linear model and its significance in
regression analysis.
4. Explain weighted resampling and its role in handling unbalanced datasets.
5. Discuss the principles and applications of multiple regression with an
example.
6. Explain how nonlinear relationships are modeled and their significance in
regression analysis.
7. Describe logistic regression, including the logit function and its applications.
8. Discuss the process of estimating parameters in regression analysis and its
importance.
9. Explain the components and significance of time series analysis with
examples.
10.Discuss moving averages and their role in time series analysis, including
examples.
11.Explain the concept of autocorrelation and its role in analyzing time series
data.
12.Discuss the impact of missing values on time series analysis and methods to
handle them.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
13.Provide an overview of survival analysis and its key concepts, including
applications.
14.Compare and contrast linear regression, logistic regression, and survival
analysis.
15.Explain the Cox proportional hazards model and its significance in survival
analysis.

You might also like