0% found this document useful (0 votes)
12 views3 pages

Assignment-I

The assignment for CSE-435 requires students to explore various aspects of Data Science, including its lifecycle, machine learning types, Python's role, and the importance of Exploratory Data Analysis (EDA). Students must discuss topics such as feature selection, big data challenges, ethical considerations, and the application of data science in healthcare. The assignment also emphasizes the significance of data cleaning, correlation analysis, and feature engineering in the data analytics process.

Uploaded by

aviichal1915.11c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Assignment-I

The assignment for CSE-435 requires students to explore various aspects of Data Science, including its lifecycle, machine learning types, Python's role, and the importance of Exploratory Data Analysis (EDA). Students must discuss topics such as feature selection, big data challenges, ethical considerations, and the application of data science in healthcare. The assignment also emphasizes the significance of data cleaning, correlation analysis, and feature engineering in the data analytics process.

Uploaded by

aviichal1915.11c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment – I

CSE-435

Due Dte: 7 October 2024 (Hard Copy in Self Handwriting)

1. What is Data Science? Explain the Data Science lifecycle and its importance in
modern industries.

2. Describe the key differences between Supervised, Unsupervised, and


Reinforcement Learning. Provide examples of each.

3. Explain the role of Python in Data Science. Discuss the various Python libraries
used for data manipulation, visualization, and machine learning.

4. Discuss the importance of Exploratory Data Analysis (EDA) in the Data Science
process. How do data visualization and statistical techniques help in EDA?

5. What is Machine Learning, and how is it applied in real-world scenarios? Discuss


the types of machine learning algorithms commonly used in industries.

6. Explain the process of feature selection and feature engineering in machine


learning. Why are these steps crucial for model performance?

7. Describe the concept of big data and its challenges. What technologies and tools
are used to handle big data in Data Science?

8. What are overfitting and underfitting in machine learning models? How can they
be prevented or corrected?

9. Discuss the ethical considerations in data science and machine learning. What
are the challenges of bias, privacy, and fairness in AI systems?

10. Explain the role of data science in healthcare. How has data science been used to
improve healthcare outcomes? Provide examples of its applications.

11. Explain the Data Analytics Process in detail. Discuss each step, from data
collection to decision-making, and illustrate how these steps interconnect in a
real-world project.

12. What is Exploratory Data Analysis (EDA), and why is it a crucial step in data
analytics? Discuss both quantitative and graphical techniques used in EDA,
providing examples of when and how they are used.

13. Compare and contrast quantitative and graphical techniques in Exploratory Data
Analysis (EDA). How do these techniques complement each other in providing a
complete understanding of the data?
14. Describe the role of data cleaning in the data analytics process. Why is it critical
to the success of data analysis, and what common techniques are used to clean
data?

15. How is correlation analysis performed in EDA? Explain the significance of the
Pearson and Spearman correlation coefficients and how they are interpreted.
Provide examples of how correlation is used in real-world data analysis.

16. Graphical techniques in EDA help uncover hidden patterns in data. Discuss how
visualizations such as histograms, box plots, scatter plots, and heatmaps
contribute to identifying trends, outliers, and relationships between variables.

17. Discuss the concept of feature engineering and its importance in the data
analytics process. How do new features improve the performance of predictive
models? Provide examples of feature engineering techniques.

18. What is the difference between descriptive and inferential statistics in data
analysis? How are both types of analysis used to derive insights from a dataset?
Provide examples of each.

19. What challenges are encountered when handling large datasets in the data
analytics process, especially during EDA? Discuss the techniques and tools used
to overcome these challenges, such as sampling, parallel processing, and using
specialized libraries.

20. How does predictive modeling fit into the data analytics process? Explain the
relationship between EDA and predictive modeling, and discuss how the insights
gathered during EDA influence the choice of models.

21. Explain the process of feature generation in detail. How do domain expertise,
brainstorming, and creativity contribute to generating meaningful features from
raw data? Provide examples from real-world applications.

22. What are the common challenges in feature generation when dealing with time
series data? Discuss techniques such as lag features, rolling statistics, and
seasonality extraction with practical examples.

23. Feature selection plays a critical role in improving the performance of machine
learning models. Compare and contrast different feature selection techniques
(Filter, Wrapper, and Embedded methods) and their applications.

24. Discuss the importance of feature selection in preventing overfitting and


improving model generalization. How do techniques like cross-validation and
regularization help in selecting the right features?
25. In the context of customer retention analysis, how can feature generation be used
to derive new insights from customer behavior data? Discuss how these features
impact predictive modeling.

26. How does L1 regularization (Lasso) aid in feature selection? Explain the
mathematical foundation of Lasso and provide examples of its use in high-
dimensional datasets.

27. What is the role of interaction terms in feature generation? How can interaction
terms enhance the predictive power of a model, and when might they be
unnecessary or harmful?

28. Explain how mutual information can be used as a feature selection criterion. What
are the advantages and limitations of using mutual information in selecting
features for machine learning models?

29. Feature selection often involves dealing with multicollinearity between variables.
Explain how multicollinearity affects models and discuss techniques for detecting
and resolving it.

30. In high-dimensional datasets, how do tree-based algorithms like Random Forest


and Gradient Boosting contribute to feature selection? Discuss how feature
importance scores are derived and used in practice.

You might also like