122469
122469
of 2024-
IT 2025
Need for data science – benefits and uses – facets of data – data science process
– setting the research goal – retrieving data – cleansing, integrating, and
transforming data – exploratory data analysis – build the models – presenting and
building applications.
PART A
PART- B
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
1. Explain the need for data science in various industries with examples.
2. Discuss the key benefits and applications of data science in healthcare and
retail.
3. Elaborate on the facets of data and their importance in data science projects.
4. Explain the data science process in detail with a real-world example.
5. Describe the steps involved in setting a research goal for a data science
project.
6. Explain the process of data retrieval from structured and unstructured
sources.
7. Discuss the significance of data cleansing, integration, and transformation
with examples.
8. Explain the methods used in exploratory data analysis and their importance.
9. Discuss the concept of outlier detection and its impact on data analysis.
10.Explain the steps involved in building predictive and prescriptive models in
data science.
11.Discuss the challenges in presenting data to non-technical stakeholders and
their solutions.
12.Explain the role of dashboards and visualization tools in building data science
applications.
13.Discuss common challenges faced during data cleansing and integration and
how to overcome them.
14.Elaborate on the importance of EDA in understanding datasets with
examples.
15.Discuss the ethical considerations in data science projects, including bias and
privacy concerns.
PART A
1. Define frequency distribution.
2. What are outliers in a dataset?
3. Name two methods to detect outliers in a dataset.
4. What is a histogram used for?
5. Differentiate between bar graphs and histograms.
6. Define mean as a measure of central tendency.
7. What is the median of a dataset?
8. Define mode with an example.
9. What is the importance of describing variability in data?
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
10.Define the range of a dataset.
11.What is the interquartile range (IQR)?
12.How is variability calculated for qualitative data?
13.What is a normal distribution?
14.State the properties of a normal distribution.
15.Define a z-score in statistics.
16.What is the significance of z-scores in a normal distribution?
17.Define correlation in the context of statistics.
18.Differentiate between positive and negative correlation.
19.What is a scatter plot?
20.Define a regression line.
21.What is the least squares regression line?
22.What is meant by the standard error of estimate?
23.Define r2r^2r2 (coefficient of determination).
24.What does r2=1r^2 = 1r2=1 signify?
25.What is the purpose of multiple regression equations?
26.Explain regression toward the mean in simple terms.
27.Define the term "dependent variable" in regression analysis.
28.What is an independent variable in regression?
29.State one application of regression in real-world scenarios.
30.What is the relationship between correlation and regression?
PART B
1. Explain frequency distributions and their importance in data analysis with
examples.
2. Discuss various methods to identify and handle outliers in datasets.
3. Explain different types of graphs used to interpret distributions with
examples.
4. Discuss the measures of central tendency (mean, median, and mode) with
their merits and demerits.
5. Describe the concepts of variability and interquartile range with examples.
6. Explain how variability is measured for qualitative and ranked data.
7. Elaborate on the properties of a normal distribution and its real-world
applications.
8. Explain the concept of z-scores and their role in standardizing datasets.
9. Discuss the significance of correlation and scatter plots in statistical analysis.
10.Explain the steps to derive a regression line using the least squares method.
11.Discuss the importance of the standard error of estimate in regression
analysis.
12.Explain the interpretation and importance of r2r^2r2 in assessing model
performance.
13.Describe the process of building and interpreting multiple regression
equations with examples.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
14.Explain the concept of regression toward the mean and its practical
implications.
15.Compare and contrast correlation and regression, highlighting their
differences and applications.
PART A
PART B
1. Explain the concepts of population and sample with examples and their
importance in statistics.
2. Discuss the principles and advantages of random sampling in data collection.
3. Elaborate on sampling distribution and its role in inferential statistics with
examples.
4. Explain the standard error of the mean and its significance in hypothesis
testing.
5. Discuss the steps involved in hypothesis testing with an example.
6. Describe the z-test procedure, including its assumptions, calculations, and
interpretations.
7. Compare one-tailed and two-tailed tests with examples and their
applications.
8. Explain the decision rule in hypothesis testing and its importance in making
statistical inferences.
9. Discuss the concept of estimation, focusing on point estimates and
confidence intervals.
10.Elaborate on the construction and interpretation of confidence intervals with
real-world examples.
11.Explain how sample size affects the standard error, confidence intervals, and
hypothesis testing outcomes.
12.Discuss the role of the level of confidence in hypothesis testing and
estimation.
13.Explain the calculation and interpretation of critical values in hypothesis
testing.
14.Discuss the types of errors in hypothesis testing (Type I and Type II) and their
implications.
15.Compare hypothesis testing and confidence interval approaches for statistical
inference.
PART A
1. What is a t-test used for in statistics?
2. Define the t-test for one sample.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
3. What is the sampling distribution of t?
4. What are the assumptions for performing a t-test?
5. What is the procedure for conducting a one-sample t-test?
6. Define the t-test for two independent samples.
7. When is the t-test for two independent samples used?
8. What is the t-test for two related samples?
9. Define the term "p-value" in hypothesis testing.
10.What does statistical significance mean?
11.What is the difference between practical significance and statistical
significance?
12.State one application of the F-test.
13.Define ANOVA (Analysis of Variance).
14.When is a one-way ANOVA used?
15.What is meant by a two-factor experiment?
16.Define the term "interaction effect" in two-factor ANOVA.
17.What are the three F-tests in ANOVA?
18.Explain the concept of between-group variance in ANOVA.
19.What is within-group variance in ANOVA?
20.What is the null hypothesis in an F-test?
21.Define the term "two-factor ANOVA."
22.What is the purpose of a chi-square test?
23.When is a chi-square goodness-of-fit test used?
24.What is the chi-square test of independence?
25.What are the assumptions for a chi-square test?
26.State one difference between a t-test and an F-test.
27.What is the relationship between ANOVA and the F-test?
28.Define critical value in the context of ANOVA.
29.What does a significant interaction in two-factor ANOVA indicate?
30.Name two statistical tests that compare means between groups.
PART B
1. Explain the t-test for one sample, including its assumptions, procedure,
and interpretation.
2. Discuss the concept of the sampling distribution of t and its importance in
hypothesis testing.
3. Explain the procedure for conducting a t-test for two independent samples
with an example.
4. Discuss the application of the p-value in hypothesis testing and its
interpretation.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
5. Explain the concept of statistical significance and its implications in real-
world scenarios.
6. Describe the t-test for two related samples with examples and its use
cases.
7. Explain the F-test, its assumptions, and its relationship with ANOVA.
8. Discuss the principles and applications of one-way ANOVA with an
example.
9. Elaborate on two-factor experiments and the role of ANOVA in analyzing
such experiments.
10. Discuss the three F-tests in ANOVA and their significance.
11. Explain the concept of two-factor ANOVA, including main effects and
interaction effects.
12. Discuss the role of ANOVA in comparing multiple group means and its
advantages over t-tests.
13. Explain the chi-square test for independence, including assumptions,
procedure, and interpretation.
14. Compare and contrast the t-test, F-test, and chi-square test, highlighting
their applications.
15. Discuss the role of statistical tests in experimental design and data
analysis, with examples of t-tests, ANOVA, and chi-square tests.
PART A
PART B
1. Explain the linear least squares method and its implementation with
examples.
2. Discuss the importance of goodness of fit in evaluating regression models.
3. Describe the process of testing a linear model and its significance in
regression analysis.
4. Explain weighted resampling and its role in handling unbalanced datasets.
5. Discuss the principles and applications of multiple regression with an
example.
6. Explain how nonlinear relationships are modeled and their significance in
regression analysis.
7. Describe logistic regression, including the logit function and its applications.
8. Discuss the process of estimating parameters in regression analysis and its
importance.
9. Explain the components and significance of time series analysis with
examples.
10.Discuss moving averages and their role in time series analysis, including
examples.
11.Explain the concept of autocorrelation and its role in analyzing time series
data.
12.Discuss the impact of missing values on time series analysis and methods to
handle them.
IT1402/ Fundamentals of Data Science and Analytics Dept. of 2024-
IT 2025
13.Provide an overview of survival analysis and its key concepts, including
applications.
14.Compare and contrast linear regression, logistic regression, and survival
analysis.
15.Explain the Cox proportional hazards model and its significance in survival
analysis.