0% found this document useful (0 votes)
15 views

Final Homework Assignment

Uploaded by

Oussema Charbib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Final Homework Assignment

Uploaded by

Oussema Charbib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Final Homework Assignment

Implement and Evaluate a CNN Pipeline with MLOps Part II

Objective:

This homework is designed to help you evaluate your knowledge and apply the MLOps
principles covered in previous sessions. You will extend the steps to implement a CNN
pipeline using tools like DVC, GitHub Actions, Docker, and MLflow. Additionally, you will
document your process and demonstrate your understanding of these tools through
hands-on implementation and theoretical questions.

Tasks:
1. Task 1: CNN Practical Implementation in ML Workflow
1.1 Data Versioning Using DVC
● Description: Set up DVC to manage and version your dataset (e.g., sea and forest
data). Add your training dataset to DVC and ensure it's tracked properly. Integrate
GitHub Actions to automate DVC processes (e.g., pulling datasets and ensuring the
pipeline is up-to-date in CI/CD workflows).
● Steps:
○ Initialise DVC.
○ Add your dataset to DVC and configure a remote (e.g., Google Drive).
○ Track the dataset changes using Git and DVC.
○ Set up a GitHub Actions workflow to automate the integration of DVC in your
project, including pulling data and verifying pipeline integrity during CI runs.
● Deliverable:
○ A brief description of how you set up DVC and integrated GitHub Actions.
○ Screenshots of your terminal commands (e.g., dvc init, dvc add, dvc pull).
○ Output showing tracked files.
○ GitHub Actions YAML configuration file and relevant logs/screenshots.

1.2 CNN Model Setup and Initial Training


● Description: Launch the training of a CNN model using the initial dataset, limited to
30 epochs. Analyse the confusion matrix and the training/validation loss curves.
● Steps:
○ Use a basic CNN architecture.
○ Train for 20 epochs and evaluate the model.
○ Visualise the confusion matrix and loss curves.
● Deliverable:
○ A description of the initial training results.
○ Screenshots of the confusion matrix and loss curves.

1.3 Model Fine-Tuning


● Description: Perform hyperparameter tuning by adjusting parameters like batch size
and learning rate. Run three experiments to compare performance:
○ Experiment 1: Epochs = 20, Batch size = 8.
○ Experiment 2: Epochs = 20, Batch size = 16.
○ Experiment 3: Epochs = 25, Batch size = 16.
● Steps:
○ Run the above experiments using MLflow for tracking.
○ Compare results such as accuracy, loss, and metrics.
● Deliverable:
○ Explanation of changes made for fine-tuning.
○ Screenshots of performance metrics and comparisons.

1.4 Model Monitoring with MLflow


● Description: Integrate MLflow to log experiments, hyperparameters, metrics, and
model artifacts for the three models.
● Steps:
○ Set up MLflow for local tracking.
○ Log metrics, hyperparameters, and artifacts.
○ Compare results in the MLflow UI.
● Deliverable:
○ Screenshots of MLflow UI with logged experiments.
○ Short description of MLflow setup.

2.1 Build and Run a Docker Image


● Description:
○ Create a Docker image for your CNN classifier application.
○ Run the image locally within a Docker container to confirm it executes as
expected.
● Deliverable:
○ Dockerfile used to build the image.
○ Screenshot of Docker container execution.

2.2 MLOps Questions


MLOps Concepts

1. How does MLOps improve the scalability of machine learning workflows?


2. What challenges do teams face when implementing MLOps in large organisations?
3. Explain the concept of feature stores and their role in the MLOps pipeline.
4. What are some strategies for ensuring data quality in MLOps pipelines?

Tool-Specific Questions

DVC:

1. How does DVC integrate with cloud storage providers, and why is this useful?
2. What role does the dvc.lock file play in maintaining pipeline integrity?
3. Discuss how DVC pipelines can be automated using CI/CD tools.
4. What is the significance of checkpoints in DVC pipelines?

MLflow:

1. How can MLflow's model serving feature simplify deployment?


2. Discuss how MLflow handles experiment reproducibility across environments.
3. What are the advantages of MLflow's integration with platforms like Kubernetes?
4. Explain how MLflow's artifact tracking supports auditability in machine learning
workflows.

General Questions

1. How can teams balance the trade-offs between automation and flexibility in MLOps
workflows?
2. Discuss the importance of explainability in models deployed via an MLOps pipeline.
3. How do MLOps practices align with ethical AI considerations?
4. What future trends do you foresee in the adoption and evolution of MLOps tools and
frameworks?

Submission Requirements:

● Submit a comprehensive report that includes:


○ Screenshots and detailed descriptions for each step.
○ Corresponding descriptions for screenshots explaining the actions taken.
● Summarise any issues faced and how you resolved them.
● For Task 2, provide concise and clear answers. Where applicable, reference practical
examples from Task 1.

Deadline: Friday, 5th January, 2025 (11:59 PM).

Good luck with your assignment! Make sure to include detailed documentation and clear
interpretations of the results for better evaluation.

You might also like