Assignment-I
Assignment-I
CSE-435
1. What is Data Science? Explain the Data Science lifecycle and its importance in
modern industries.
3. Explain the role of Python in Data Science. Discuss the various Python libraries
used for data manipulation, visualization, and machine learning.
4. Discuss the importance of Exploratory Data Analysis (EDA) in the Data Science
process. How do data visualization and statistical techniques help in EDA?
7. Describe the concept of big data and its challenges. What technologies and tools
are used to handle big data in Data Science?
8. What are overfitting and underfitting in machine learning models? How can they
be prevented or corrected?
9. Discuss the ethical considerations in data science and machine learning. What
are the challenges of bias, privacy, and fairness in AI systems?
10. Explain the role of data science in healthcare. How has data science been used to
improve healthcare outcomes? Provide examples of its applications.
11. Explain the Data Analytics Process in detail. Discuss each step, from data
collection to decision-making, and illustrate how these steps interconnect in a
real-world project.
12. What is Exploratory Data Analysis (EDA), and why is it a crucial step in data
analytics? Discuss both quantitative and graphical techniques used in EDA,
providing examples of when and how they are used.
13. Compare and contrast quantitative and graphical techniques in Exploratory Data
Analysis (EDA). How do these techniques complement each other in providing a
complete understanding of the data?
14. Describe the role of data cleaning in the data analytics process. Why is it critical
to the success of data analysis, and what common techniques are used to clean
data?
15. How is correlation analysis performed in EDA? Explain the significance of the
Pearson and Spearman correlation coefficients and how they are interpreted.
Provide examples of how correlation is used in real-world data analysis.
16. Graphical techniques in EDA help uncover hidden patterns in data. Discuss how
visualizations such as histograms, box plots, scatter plots, and heatmaps
contribute to identifying trends, outliers, and relationships between variables.
17. Discuss the concept of feature engineering and its importance in the data
analytics process. How do new features improve the performance of predictive
models? Provide examples of feature engineering techniques.
18. What is the difference between descriptive and inferential statistics in data
analysis? How are both types of analysis used to derive insights from a dataset?
Provide examples of each.
19. What challenges are encountered when handling large datasets in the data
analytics process, especially during EDA? Discuss the techniques and tools used
to overcome these challenges, such as sampling, parallel processing, and using
specialized libraries.
20. How does predictive modeling fit into the data analytics process? Explain the
relationship between EDA and predictive modeling, and discuss how the insights
gathered during EDA influence the choice of models.
21. Explain the process of feature generation in detail. How do domain expertise,
brainstorming, and creativity contribute to generating meaningful features from
raw data? Provide examples from real-world applications.
22. What are the common challenges in feature generation when dealing with time
series data? Discuss techniques such as lag features, rolling statistics, and
seasonality extraction with practical examples.
23. Feature selection plays a critical role in improving the performance of machine
learning models. Compare and contrast different feature selection techniques
(Filter, Wrapper, and Embedded methods) and their applications.
26. How does L1 regularization (Lasso) aid in feature selection? Explain the
mathematical foundation of Lasso and provide examples of its use in high-
dimensional datasets.
27. What is the role of interaction terms in feature generation? How can interaction
terms enhance the predictive power of a model, and when might they be
unnecessary or harmful?
28. Explain how mutual information can be used as a feature selection criterion. What
are the advantages and limitations of using mutual information in selecting
features for machine learning models?
29. Feature selection often involves dealing with multicollinearity between variables.
Explain how multicollinearity affects models and discuss techniques for detecting
and resolving it.