0% found this document useful (0 votes)
83 views

List of AI Project Problem Statement

list of AI Project problem statement

Uploaded by

aamir15sid
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

List of AI Project Problem Statement

list of AI Project problem statement

Uploaded by

aamir15sid
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

S Module Name Projects Choices {Chose any one in all Current

No modules} completion
Status
1 Applied Data Science 1. Feature Engineering
with Python OR
2. Customer Service Analysis
2 Machine Learning 1 Health care
OR
2Book Retail Recommendation
3 Deep Learning with Lending Club Loan Data Analysis
Tensorflow and
Keras

4 Advanced Deep 1. Perform Facial Recognition with


Learning and Deep Learning in Keras Using
Computer Vision CNN
OR
2. Train and Deploy a CNN Model
Using TensorFlow Serving
OR
3. Emotion Recognition.
OR
4. Detection of Lung Infection.
5 AI and Machine 1. Healthcare
Learning Capstone OR
Project 2. Cyber Security
OR
3. Retail
6 ELECTIVE- Natural 1.Twitter hate
Language Processing AND
and Speech 2.Zomato rating
Recognition AND
3. Search Engine
7 ELECTIVE- Stock Trading Using Deep Q-Learning
Reinforcement
Learning

8 ELECTIVE- Git and Branching Development Model


GitHub Training

9 ELECTIVE- Data 1.California Housing Price Prediction


Science with Python Or
2.Comcast Telecom Consumer Complaints

10 ELECTIVE- 1.Mercedes-Benz Greener Manufacturing.


Machine Learning Or
Building user-based recommendation
model for Amazon
Or
Income Qualification.
 Project 1- Programming Refresher- Prerequisite

Software Requirements: To perform the steps in this project, you need to install Python in your local system first. You
can refer to Programming Basics - Demo Install Python in the course for more information on installation.
Problem Statement:
(Use parameterized constructors in all classes to initialize default values)
Create a bank class with the following attributes:
o IFSC_Code
o bankname
o branchname
o loc
Create a customer class with the following attributes:
o CustomerID
o custname
o address
o contactdetails
Create an account class that inherits from bank class with the following attributes (Use Super () to pass value to the base
class):
● AccountID
● Cust Object of Customer
● balance
Add the following methods to get account information, withdraw, and deposit: getAccountInfo() deposit(2000,'true')
widthraw(500) getBalance()
2 | Page
Create a SavigsAccount class that inherits from the account with the following attributes (Use Super () to pass valued to
the base class):
● SMinBalance
Add the following methods to get account information, withdraw, and deposit: getSavingAccountInfo()
deposit(2000,'true') widthraw(500) getBalance()
Validate MinBalance before allowing withdrawals.
Create a class that runs the program and accepts input from the end user to create respective class objects and print
details. Add a method to perform deposit and withdrawal transaction based on the end user input.
Project 2- Programming Refresher- Prerequisite
Course-End Project: MLB Digital Platform Enhancement
Problem Statement:
The Major League Baseball (MLB) is a famous league with one of the highest viewership. They are planning to update
their digital platform for faster load times and superior user experience. As part of the development team, you have to
support the backend development. You are required to create modules to manage player statistics, match schedules,
ticket bookings, and other activities.
Objectives:

To design and implement a backend system for MLB's digital platform

To create and manage player statistics, match schedules, ticket bookings, and team management

To implement a multi-threaded report generation system for performance efficiency
Steps to Perform:
1. Player Management System:
• Design the Player class with specified attributes and methods
• Implement methods to update and retrieve player statistics
2. Match Schedule Management:
• Design the Schedule class with specified attributes and methods
• Implement methods to update and retrieve match details
3. Ticket Booking System:
• Design the Ticket class with specified attributes and methods
• Implement methods to book, cancel, and retrieve ticket details
4. Team Management System:
• Design the Team class with specified attributes and methods
• Implement methods to manage team rosters
5. Booking Management System:
• Design the Booking class with specified attributes and methods
• Implement methods to manage ticket bookings
6. Multi-Threaded Report Generation:
• Design the MLB Backend class with specified attributes and methods
• Implement a multi-threaded report generation system for player statistics
Project 3- Programming Refresher- Prerequisite
Course-End Project: EdTech Backend System
Problem Statement: SL Tech is an edtech company that provides training programs on various technical and functional
skills. They are planning to update their learner interface to enhance the learning experience. As part of the development
team, you have to support the backend development by creating modules to manage the credentials, courses, and other
activities of learners.
Objectives:
1.
To design and implement a backend system for SL Tech's learner interface
2.
To create and manage user credentials, course enrollments, and assignments
3.
To integrate all modules into a comprehensive backend system
Steps to Perform:
1.
Create a User Class:

Design the User class with specified attributes and methods

Implement methods to update the email and password and validate credentials
2.
Create a Learner Class:

Design the Learner class that inherits from the User class

Implement methods to enroll in and drop courses
3.
Define an Instructor Class:

Design the Instructor class that inherits from the User class

Implement methods to add and remove courses taught by the instructor
4. Define a Course Class:
• Design the Course class with specified attributes and methods
• Implement methods to add, remove, and list learners
5. Create an Enrollment Class:
• Design the Enrollment class with specified attributes and methods
• Implement methods to manage course enrollments
6. Integrate All Modules into the Backend:
• Design the SLTech Backend class with specified attributes and methods
• Implement methods to manage users, courses, and enrollments
• Implement methods to retrieve enrolled learners and courses
7. Add a User Input Method:
• Implement interactive functions to handle user input for adding users, courses, and enrollments

Print the output
Project 4- Feature Engineering- Course 2- Applied Data Science with Python

Project Statement:
While searching for the dream house, the buyer looks at various factors, not just at the height of the basement ceiling or
the proximity to an east-west railroad.
Using the dataset, find the factors that influence price negotiations while buying a house.
There are 79 explanatory variables describing every aspect of residential homes in Ames, Iowa.

Dataset Description:

Variable Description
The property's sale price is in dollars. This is the target variable
SalePrice
that you're trying to predict.
MSSubClass The building class
MSZoning The general zoning classification
LotFrontage Linear feet of street connected to property
LotArea Lot size in square feet
Street Type of road access
Alley Type of alley access
LotShape General shape of property
LandContour Flatness of the property
Utilities Type of utilities available
LotConfig Lot configuration
LandSlope Slope of property
Neighborhood Physical locations within Ames city limits
Condition1 Proximity to main road or railroad
Condition2 Proximity to main road or railroad (if a second is present)
BldgType Type of dwelling
HouseStyle Style of dwelling
OverallQual Overall material and finish quality
OverallCond Overall condition rating
YearBuilt Original construction date
YearRemodAdd Remodel date
RoofStyle Type of roof
RoofMatl Roof material
Exterior1st Exterior covering on house
Exterior2nd Exterior covering on house (if more than one material)
MasVnrType Masonry veneer type
MasVnrArea Masonry veneer area in square feet
ExterQual Exterior material quality
ExterCond Present condition of the material on the exterior
Foundation Type of foundation
BsmtQual Height of the basement
BsmtCond General condition of the basement
BsmtExposure Walkout or garden level basement walls
BsmtFinType1 Quality of the basement finished area
BsmtFinSF1 Type 1 finished square feet
BsmtFinType2 Quality of second finished area (if present)
BsmtFinSF2 Type 2 finished square feet
BsmtUnfSF Unfinished square feet of basement area
TotalBsmtSF Total square feet of basement area
Heating Type of heating
HeatingQC Heating quality and condition
CentralAir Central air conditioning
Electrical Electrical system
1stFlrSF First Floor square feet
2ndFlrSF Second floor square feet
LowQualFinSF Low quality finished square feet (all floors)
GrLivArea Above grade (ground) living area square feet
BsmtFullBath Basement full bathrooms
BsmtHalfBath Basement half bathrooms
FullBath Full bathrooms above grade
HalfBath Half bathrooms above grade
Bedroom Number of bedrooms above basement level
Kitchen Number of kitchens
KitchenQual Kitchen quality
TotRmsAbvGrd Total rooms above grade (does not include bathrooms)
Functional Home functionality rating
Fireplaces Number of fireplaces
FireplaceQu Fireplace quality
GarageType Garage location
GarageYrBlt Year garage was built
GarageFinish Interior finish of the garage
GarageCars Size of the garage in car capacity
GarageArea Size of the garage in square feet
GarageQual Garage quality
GarageCond Garage condition
PavedDrive Paved driveway
WoodDeckSF Wood deck area in square feet
OpenPorchSF Open porch area in square feet
EnclosedPorch Enclosed porch area in square feet
3SsnPorch Three season porch area in square feet
ScreenPorch Screen porch area in square feet
PoolArea Pool area in square feet
PoolQC Pool quality
Fence Fence quality
MiscFeature Miscellaneous feature not covered in other categories
MiscVal $Value of miscellaneous feature
MoSold Month Sold
YrSold Year Sold
SaleType Type of sale
SaleCondition Condition of sale

Note:
1) Download the “PEP1.csv” using the link given in the Feature Engineering project problem statement
2) For a detailed description of the dataset, you can download and refer to data_description.txt using the link
given in the Feature Engineering project problem statement
Perform the following steps:

1. Understand the dataset:

a. Identify the shape of the dataset


b. Identify variables with null values
c. Identify variables with unique values

2. Generate a separate dataset for numerical and categorical variables


3. EDA of numerical variables:

a. Missing value treatment


b. Identify the skewness and distribution
c. Identify significant variables using a correlation matrix
d. Pair plot for distribution and density

4. EDA of categorical variables

a. Missing value treatment


b. Count plot for bivariate analysis
c. Identify significant variables using p-values and Chi-Square values

5. Combine all the significant categorical and numerical variables


6. Plot box plot for the new dataset to find the variables with outliers

Note: The last two points are performed to make the new dataset ready for training and prediction.
Project 5- Course 2- Applied Data Science with Python

1) Download the dataset using the link given in the project problem statement

2) Refer to the lab walkthrough video in the LMS and the lab guides in order to upload the datasets to the labs

1. Understand the dataset:

1.1 Import the dataset

1.2 Visualize the dataset

1.3 Print the columns of the DataFrame

1.4 Identify the shape of the dataset

1.5 Identify the variables with null values

2. Perform basic data exploratory analysis:

2.1 Draw a frequency plot to show the number of null values in each column of the DataFrame

2.2 Missing value treatment

2.2.1 Remove the records whose Closed Date values are null

2.3 Analyze the date column, and remove entries that have an incorrect timeline

2.3.1 Calculate the time elapsed in closed and creation date

2.3.2 Convert the calculated date to seconds to get a better representation

2.3.3 View the descriptive statistics for the newly created column

2.3.4 Check the number of null values in the Complaint_Type and City
columns

2.3.5 Impute the NA value with Unknown City

2.3.6 Draw a frequency plot for the complaints in each city

2.3.7 Create a scatter and hexbin plot of the concentration of complaints across
Brooklyn

3. Find major types of complaints:

3.1 Plot a bar graph to show the types of complaints

3.2 Check the frequency of various types of complaints for New York City
3.3 Find the top 10 complaint types

3.4 Display the various types of complaints in each city

3.5 Create a DataFrame, df_new, which contains cities as columns and complaint types in rows

4. Visualize the major types of complaints in each city

4.1 Draw another chart that shows the types of complaints in each city in a single chart, where
different colors show the different types of complaints

4.2 Sort the complaint types based on the average Request_Closing_Time grouping them for
different locations
5.See whether the average response time across different complaint types is similar (overall)

5.1 Visualize the average of Request_Closing_Time


6.Identify the significant variables by performing statistical analysis using p-values
7.Perform a Kruskal-Wallis H test

7.1 Fail to reject H0: All sample distributions are equal

7.2 Reject H0: One or more sample distributions are not equal

8. Present your observations


•The project aims to help you work with the dataset and performing analysis.
•In this project, you will assess the data and prepare a fresh dataset for training and prediction.
•You will plot a bar graph to identify the relationship between two variables.
•You will also visualize the major types of complaints in each city.
1.Complete the project in the Simplilearn Lab
2.Complete each task listed in the problem statement
3.Take screenshots of the results for each question and the corresponding code
4.Save it as a document, and submit it using the assessment tab
5.Tap the Submit button (this will present you with three choices)
6.Attach the three files, and then click on Submit

Note: Be sure to include the screenshots of the output.


Project6 - Healthcare- Course 3- PG AIML- Machine Learning

Problem statement:

Cardiovascular diseases are the leading cause of death globally. It is therefore necessary to identify the causes and develop a
system to predict heart attacks in an effective manner. The data below has the information about the factors that might have an
impact on cardiovascular health.

Dataset description:

Variable Description

Age Age in years

Sex 1 = male; 0 = female

cp| Chest pain type

trestbps Resting blood pressure (in mm Hg on admission to the


hospital)

chol Serum cholesterol in mg/dl

fbs Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)

restecg Resting electrocardiographic results

thalach Maximum heart rate achieved

exang Exercise induced angina (1 = yes; 0 = no)

oldpeak ST depression induced by exercise relative to rest


slope Slope of the peak exercise ST segment

ca Number of major vessels (0-3) colored by fluoroscopy

thal 3 = normal; 6 = fixed defect; 7 = reversible defect

Target 1 or 0

Note:

Download CEP 1_ Dataset.xlsx using the link given in the Healthcare project problem statement

Task to be performed:

1. Preliminary analysis:
a. Perform preliminary data inspection and report the findings on the structure of the data, missing values,
duplicates, etc.
b. Based on these findings, remove duplicates (if any) and treat missing values using an appropriate strategy

2. Prepare a report about the data explaining the distribution of the disease and the related factors using the steps listed
below:
a. Get a preliminary statistical summary of the data and explore the measures of central tendencies and spread
of the data
b. Identify the data variables which are categorical and describe and explore these variables using the
appropriate tools, such as count plot
c. Study the occurrence of CVD across the Age category
d. Study the composition of all patients with respect to the Sex category
e. Study if one can detect heart attacks based on anomalies in the resting blood pressure (trestbps) of a patient
f. Describe the relationship between cholesterol levels and a target variable
g. State what relationship exists between peak exercising and the occurrence of a heart attack
h. Check if thalassemia is a major cause of CVD
i. List how the other factors determine the occurrence of CVD
j. Use a pair plot to understand the relationship between all the given variables
3. Build a baseline model to predict the risk of a heart attack using a logistic regression and random forest and explore the
results while using correlation analysis and logistic regression (leveraging standard error and p-values from
statsmodels) for feature selection
Project 7-Course 3- PG AIML- Machine Learning- Book Rental Recommendation.

Course-end Project 2

Description
Book Rent is the largest online and offline book rental chain in India. They provide books of various genres, such as thrillers,
mysteries, romances, and science fiction. The company charges a fixed rental fee for a book per month. Lately, the company has
been losing its user base. The main reason for this is that users are not able to choose the right books for themselves. The
company wants to solve this problem and increase its revenue and profit.
Project Objective:
You, as an ML expert, should focus on improving the user experience by personalizing it to the user's needs. You have to model a
recommendation engine so that users get recommendations for books based on the behavior of similar users. This will ensure that
users are renting the books based on their tastes and traits.
Note: You have to perform user-based collaborative filtering and item-based collaborative filtering.
Dataset description:
BX-Users: It contains the information of users.

 user_id - These have been anonymized and mapped to integers

 Location - Demographic data is provided

 Age - Demographic data is provided


If available, otherwise, these fields contain NULL-values.

BX-Books:

 isbn - Books are identified by their respective ISBNs. Invalid ISBNs have already been removed from the dataset.
 book_title
 book_author
 year_of_publication
 publisher

BX-Book-Ratings: Contains the book rating information.

 user_id
 isbn
 rating - Ratings (`Book-Rating`) are either explicit, expressed on a scale from 1–10 (higher values denoting higher appreciation),
or implicit, expressed by 0.
Note: Download the “BX-Book-Ratings.csv”, “BX-Books.csv”, “BX-Users.csv”, and “Recommend.csv” using the link given in
the Book Rental Recommendation project problem statement.

Following operations should be performed:

 Read the books dataset and explore it


 Clean up NaN values
 Read the data where ratings are given by users
 Take a quick look at the number of unique users and books
 Convert ISBN variables to numeric numbers in the correct order
 Convert the user_id variable to numeric numbers in the correct order
 Convert both user_id and ISBN to the ordered list, i.e., from 0...n-1
 Re-index the columns to build a matrix
 Split your data into two sets (training and testing)
 Make predictions based on user and item variables
 Use RMSE to evaluate the predictions

To download the dataset click here


Project 8- Lending Club Loan Data Analysis – Course 4- PG AIML- Deep Learning with Tensorflow and Keras

Description
Create a model that predicts whether or not a loan will be default using the historical data.

Problem Statement:
For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project,
using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future
loans. As you will see later this dataset is highly imbalanced and includes a lot of features that makes this problem more
challenging.
Domain: Finance
Analysis to be done: Perform data preprocessing and build a deep learning prediction model.
Content:
Dataset columns and definition:

 credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
 purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase",
"small_business", and "all_other").
 int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by
LendingClub.com to be more risky are assigned higher interest rates.
 installment: The monthly installments owed by the borrower if the loan is funded.
 log.annual.inc: The natural log of the self-reported annual income of the borrower.
 dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).
 fico: The FICO credit score of the borrower.
 days.with.cr.line: The number of days the borrower has had a credit line.
 revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).
 revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).
 inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.
 delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
 pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).

Steps to perform:
Perform exploratory data analysis and feature engineering and then apply feature engineering. Follow up with a deep learning
model to predict whether or not the loan will be default using the historical data.

Tasks:
1. Feature Transformation

 Transform categorical values into numerical values (discrete)


2. Exploratory data analysis of different factors of the dataset.
3. Additional Feature Engineering

 You will check the correlation between features and will drop those features which have a strong correlation
 This will help reduce the number of features and will leave you with the most relevant features
4. Modeling
 After applying EDA and feature engineering, you are now ready to build the predictive models
 In this part, you will create a deep learning model using Keras with Tensorflow backend

To download the datasets click here -

Project 9- Perform Facial Recognition with Deep Learning in Keras Using CNN- Course 5-PG AIML - Advanced Deep
Learning and Computer Vision

Course-end Project 1

Description
Problem Statement:
Facial recognition is a biometric alternative that measures unique characteristics of a human
face. Applications available today include flight check in, tagging friends and family members in
photos, and “tailored” advertising. You are a computer vision engineer who needs to develop a
face recognition programme with deep convolutional neural networks.
Objective: Use a deep convolutional neural network to perform facial recognition using Keras.
Dataset Details:
ORL face database composed of 400 images of size 112 x 92. There are 40 people, 10 images
per person. The images were taken at different times, lighting and facial expressions. The faces
are in an upright position in frontal view, with a slight left-right rotation.
Link to the Dataset: https://www.dropbox.com/s/i7uzp5yxk7wruva/ORL_faces.npz?dl=0
Prerequisites:
Keras
Scikit Learn
Steps to be followed:
1. Input the required libraries
2. Load the dataset after loading the dataset, you have to normalize every image.
3. Split the dataset
4. Transform the images to equal sizes to feed in CNN
5. Build a CNN model that has 3 main layers:
i. Convolutional Layer
ii. Pooling Layer
iii. Fully Connected Layer
6. Train the model
7. Plot the result
8. Iterate the model until the accuracy is above 90%

Project 10- Train and Deploy a CNN Model Using TensorFlow Serving - Course 5-PG AIML - Advanced Deep Learning
and Computer Vision

Course-end Project 2

Description
Problem Statement:
You’re a Computer Vision Engineer at health.ai. Your company is developing a deep learning application to automate the
detection of diabetic retinopathy. The company is sourcing high-resolution retina image data from various clinical partners but
the dataset is expected to be huge and cannot be stored on a central system. You’re asked to build a proof of concept using the
Kaggle retinopathy dataset to train a CNN model with the Mirrored Strategy and deploy it with TensorFlow Serving.

Objective: To build a CNN model using distributed training that can detect diabetic retinopathy and deploy it using TensorFlow
Serving.

Dataset Details:
The dataset contains a large set of high-resolution retina images taken under a variety of imaging conditions. A left and right field
is provided for every subject. Images are labeled with a subject id as well as either left or right. A clinician has rated the presence
of diabetic retinopathy in each image on a scale of 0 to 4. Like any real-world dataset, you will encounter noise in both the
images and labels. Images may contain artifacts, be out of focus, underexposed, or overexposed.

Link to the Dataset: https://www.dropbox.com/sh/7z7xq2lq3ogspcv/AACF_50dOtFaVYoII80abNPLa?dl=0

Prerequisites:
TensorFlow
Keras
TensorFlow Serving

Steps to be followed:
1. Download and preprocess the dataset to correct for noise and under and over exposure
2. Augment the dataset and split it into training and test sets
3. Define the distributed training strategy
4. Define the number of shared instances
5. Define a CNN architecture to extract features from the model data
6. Define parameters like the loss, optimizer, epochs, learning rate, and evaluation metric
7. Define checkpoints
8. Train the model until an accuracy of at least 80% is obtained
9. Save the model
10. Deploy the saved model using TensorFlow Serving

Project 11- Emotion Recognition.- Course 5-PG AIML - Advanced Deep Learning and Computer Vision

Course-end Project 3

Description
Future customizations, such as understanding human emotions, could lead to a range of advancements, such as determining
whether a person likes a specific statement, item or product, food, or how they are feeling in a particular circumstance, and so
on.

Objective:
To build a model using a convolutional neural network that can classify a person's emotion

Dataset description:
The dataset contains two folders named Train and Test. These folders have approximately 35,000 images of seven different
human emotions, such as anger, disgust, fear, happiness, neutral, sadness, and surprise.

 Train folder: This folder has images for training the model, which is divided into subfolders having the same name as the class.

 Test folder: This folder has images for testing the model, which is divided into subfolders having the same name as the class.

Following operations should be performed using KERAS or PyTorch or Torch vision:


1. Import the necessary libraries
2. Plot sample images for all the classes
3. Plot the bar graph for the number of images in each class for both training and testing data
4. Build a data augmentation for train data to create new data with translation, rescale and flip, and rotation
transformations. Rescale the image at 48x48
5. Build a data augmentation for test data to create new data and rescale the image at 48x48
6. Read images directly from the train folder and test folder using the appropriate function

Build 3 CNN model with:

1. CNN Architecture:
1. Add convolutional layers, max pool layers, dropout layers, batch normalization layers
2. Use Relu as activation functions
3. Take loss function as categorical cross-entropy
4. Take Adam as an optimizer
5. Use early-stop with two patients and monitor for validation loss
6. Try with ten number epochs
7. Train the model using the generator and test the accuracy of the test data at every epoch
8. Plot the training and validation accuracy, and the loss
9. Observe the precision, recall the F1-score for all classes for both grayscale and color models, and determine
if the model’s classes are good

2. Customized CNN Architecture:


1. Add convolutional layers, max pool layers, dropout layers, batch normalization layers on the top of the first
model architecture to improve the accuracy
2. Change the batch size activation function and optimizer as rmsprop and observe if the accuracy increases
3. Take the loss function as categorical cross-entropy
4. Use early stopping with the patience of two epochs and monitoring of validation loss
5. Try with ten number epochs
6. Train the model using the generator and test the accuracy of the test data at every epoch
7. Plot the training and validation accuracy, and the loss
8. Observe the precision, recall the F1-score for all classes for both grayscale and color models, and determine
if the model’s classes are good

3. Transfer Learning:
1. Prepare the data for the transfer learning algorithm
2. Freeze the top layers of the pre-trained model
3. Add a dense layer at the end of the pre-trained model followed by a dropout layer
4. Add the final output layer with the SoftMax activation function
5. Take the loss function as categorical cross-entropy
6. Take Adam as an optimizer
7. Use early stopping with the patience of two epochs and monitor the validation loss which is set as minimum
mode
8. Try with fifteen number epochs
9. Train the model using the generator and test the accuracy of the test data at every epoch
10. Plot the training and validation accuracy, and the loss
11. Observe the precision, recall the F1-score for all classes for both grayscale and color models, and determine
if the model’s classes are good

Final Steps:
1. Compare all the models on the basis of accuracy, precision, recall, and f1-score
2. Write at least 3 more things to increase the model’s performance

You can download the datasets from the Course Resources (Self-learning tab)

Project 12- Detection of Lung Infection.- Course 5-PG AIML - Advanced Deep Learning and Computer Vision

Course-end Project 4

Description
Artificial Intelligence has evolved a lot and is currently able to solve problems that are very complex and require human
specialization. One such area is healthcare.

A lot of research happens every day to use deep learning for the betterment of humanity, and one such is healthcare.

Objective:
To build a model using a convolutional neural network that can classify lung infection in a person using medical imagery

Dataset Description:
The dataset contains three different classes, including healthy, type 1 disease, and type 2 disease.

 Train folder: This folder has images for training the model, which is divided into subfolders having the same name as the class.

 Test folder: This folder has images for testing the model, which is divided into subfolders having the same name as the class.

Following operations should be performed using Keras or PyTorch or Torch vision-


1. Import the necessary libraries
2. Plot the sample images for all the classes
3. Plot the distribution of images across the classes
4. Build a data augmentation for train data to create new data with translation, rescale and flip, and rotation
transformations. Rescale the image at 48x48
5. Build a data augmentation for test data to create new data and rescale the image at 48x48
6. Read images directly from the train folder and test folder using the appropriate function

Build 3 CNN model with:


1. CNN Architecture:

1. Add convolutional layers with different filters, max pool layers, dropout layers, and batch normalization layers
2. Use Relu as an activation function
3. Take the loss function as categorical cross-entropy
4. Take rmsprop as an optimizer
5. Use early stopping with the patience of two epochs and monitor the validation loss or accuracy
6. Try with ten numbers epoch
7. Train the model using a generator and test the accuracy of the test data at every epoch
8. Plot the training and validation accuracy, and the loss
9. Observe the precision, recall the F1-score for all classes for both grayscale and color models, and determine if the
model’s classes are good

2. Transfer learning using mobile net:

1. Prepare data for the pre-trained mobile net model, with color mode as RGB
2. Create an instance of a mobile net pre-trained model
3. Add dense layer, dropout layer, batch normalization layer on the pre-trained model
4. Create a final output layer with a SoftMax activation function
5. Change the batch size activation function and optimize as rmsprop and observe if the accuracy increases
6. Take the loss function as categorical cross-entropy
7. Use early stopping with the patience of two epoch and call back function for preventing overfitting
8. Try with ten numbers epoch
9. Train the model using a generator and test the accuracy of the test data at every epoch
10. Plot the training and validation accuracy, and the loss
11. Observe the precision, recall the F1-score for all classes for both grayscale
12. and color models, and determine if the model’s classes are good

3. Transfer Learning using Densenet121:

1. Prepare the dataset for the transfer learning algorithm using Densenet121 with the image size as 224x224x3
2. Freeze the top layers of the pre-trained model
3. Add a dense layer at the end of the pre-trained model followed by a dropout layer and try various combinations to get
an accuracy
4. Add the final output layer with a SoftMax activation function
5. Take loss function as categorical cross-entropy
6. Take Adam as an optimizer
7. Use early stopping to prevent overfitting
8. Try with 15 number of epoch and batch size with seven, also try various values to see the impact on results
9. Train the model using the generator and test the accuracy of the test data at every epoch
10. Plot the training and validation accuracy, and the loss
11. Observe the precision, recall the F1-score for all classes for both grayscale
and color models, and determine if the model’s classes are good

Final step:
1. Compare all the models on the basis of accuracy, precision, recall, and f1-score

You can download the datasets from the Reference Materials Section (Self-learning tab)

Project 13- Healthcare -PG AIML - AI and Machine Learning Capstone Project

Healthcare

Course-end Project 1

Description
Problem Statement:
ICMR wants to analyze different types of cancers, such as breast cancer, renal cancer, colon cancer, lung cancer, and prostate
cancer becoming a cause of worry in recent years. They would like to identify the probable cause of these cancers in terms of
genes responsible for each cancer type. This would lead us to early identification of each type of cancer reducing the fatality rate.

Dataset Details:
The input dataset contains 802 samples for the corresponding 802 people who have been detected with different types of cancer.
Each sample contains expression values of more than 20K genes. Samples have one of the types of tumors: BRCA, KIRC,
COAD, LUAD, and PRAD.

Project Task: Week 1


Exploratory Data Analysis:
1. Merge both the datasets.
2. Plot the merged dataset as a hierarchically-clustered heatmap.
3. Perform Null-hypothesis testing.
Dimensionality Reduction:
4. Each sample has expression values for around 20K genes. However, it may not be necessary to include all 20K genes
expression values to analyze each cancer type. Therefore, we will identify a smaller set of attributes which will then be
used to fit multiclass classification models. So, the first task targets the dimensionality reduction using various
techniques such as,
PCA, LDA, and t-SNE.
5. Input: Complete dataset including all genes (20531)
6. Output: Selected Genes from each dimensionality reduction method

Project Task: Week 2


Clustering Genes and Samples:
1. Our next goal is to identify groups of genes that behave similarly across samples and identify the distribution of
samples corresponding to each cancer type. Therefore, this task focuses on applying various clustering techniques, e.g.,
k-means, hierarchical and mean shift clustering, on genes and samples.

 First, apply the given clustering technique on all genes to identify:

 Genes whose expression values are similar across all samples

 Genes whose expression values are similar across samples of each cancer type
Next, apply the given clustering technique on all samples to identify:

 Samples of the same class (cancer type) which also correspond to the same cluster

 Samples identified to be belonging to another cluster but also to the same class (cancer type)
Building Classification Model(s) with Feature Selection:
2. Our final task is to build a robust classification model(s) for identifying each type of cancer. It also aims at the to do
feature selection in order to identify the genes that help in classifying each cancer type.

Sub-tasks:
1. Build a classification model(s) using multiclass SVM, Random Forest, and Deep Neural Network to classify the input
data into five cancer types
2. Apply the feature selection algorithms, forward selection and backward elimination to refine selected attributes
(selected in Task-2) using the classification model from the previous step
3. Validate the genes selected from the last step using statistical significance testing (t-test for one vs. all and F-test)

You can download the datasets from the reference materials section

Project 14- PG AIML - AI and Machine Learning Capstone Project


Cyber Security

Course-end Project 2
Description
Problem Statement:
Book-My-Show will enable the ads on their website, but they are also very cautious about their user privacy and information who
visit their website. Some ads URL could contain a malicious link that can trick any recipient and lead to a malware installation,
freezing the system as part of a ransomware attack or revealing sensitive information. Book-My-Show now wants to analyze that
whether the particular URL is prone to phishing (malicious) or not.

Dataset Details:
The input dataset contains an 11k sample corresponding to the 11k URL. Each sample contains 32 features that give a
different and unique description of URL ranging from -1,0,1.
1: Phishing
0: Suspicious
1: Legitimate
The sample could be either legitimate or phishing.

Project Task: Week 1


Exploratory Data Analysis:
1. Each sample has 32 features ranging from -1,0,1. Explore the data using histogram, heatmaps.
2. Determine the number of samples present in the data, unique elements in all the features.
3. Check if there is any null value in any features.
Correlation of features and feature selection:
4. Next, we have to find if there are any correlated features present in the data. Remove the feature which might be
correlated with some threshold.

Project Task: Week 2


Building Classification Model
1. Finally, build a robust classification system that classifies whether the URL sample is a phishing site or not.

 Build classification models using a binary classifier to detect malicious or phishing URLs.

 Illustrate the diagnostic ability of this binary classifier by plotting the ROC curve.

 Validate the accuracy of data by the K-Fold cross-validation technique.

 The final output consists of the model, which will give maximum accuracy on the validation dataset with selected
attributes.

You can download the datasets from the reference materials section

Project 14- PG AIML - AI and Machine Learning Capstone Project


Project 15- PG AIML - AI and Machine Learning Capstone Project
Retail

Course-end Project 3

Description
Problem Statement

 Demand Forecast is one of the key tasks in Supply Chain and Retail Domain in general. It is key in effective operation and
optimization of retail supply chain. Effectively solving this problem requires knowledge about a wide range of tricks in Data
Sciences and good understanding of ensemble techniques.
 You are required to predict sales for each Store-Day level for one month. All the features will be provided and actual sales that
happened during that month will also be provided for model evaluation.
Dataset Snapshot
Training Data Description: Historic sales at Store-Day level for about two years for a retail giant, for more than 1000 stores.
Also, other sale influencers like, whether on a particular day the store was fully open or closed for renovation, holiday and special
event details, are also provided.

Project Task: Week 1


Exploratory Data Analysis (EDA) and Linear Regression:
1. Transform the variables by using data manipulation techniques like, One-Hot Encoding
2. Perform an EDA (Exploratory Data Analysis) to see the impact of variables over Sales.
3. Apply Linear Regression to predict the forecast and evaluate different accuracy metrices like RMSE (Root Mean
Squared Error) and MAE(Mean Absolute Error) and determine which metric makes more sense. Can there be a better
accuracy metric?

 Train a single model for all stores, using storeId as a feature.

 Train separate model for each store.

 Which performs better and Why? [In the first case, parameters are shared and not very free but not in second case]

 Try Ensemble of b) and c). What are the findings?

 Use Regularized Regression. It should perform better in an unseen test set. Any insights?

 Open-ended modeling to get possible predictions.


Other Regression Techniques:
4. When store is closed, sales = 0. Can this insight be used for Data Cleaning? Perform this and retrain the model. Any
benefits of this step?
5. Use Non-Linear Regressors like Random Forest or other Tree-based Regressors.

 Train a single model for all stores, where storeId can be a feature.
 Train separate models for each store.
Note: Dimensional Reduction techniques like, PCA and Tree’s Hyperparameter Tuning will be required. Cross-validate to find
the best parameters. Infer the performance of both the models.
6. Compare the performance of Linear Model and Non-Linear Model from the previous observations. Which performs
better and why?
7. Train a Time-series model on the data taking time as the only feature. This will be a store-level training.

 Identify yearly trends and seasonal months

Project Task: Week 2


Implementing Neural Networks:
1. Train a LSTM on the same set of features and compare the result with traditional time-series model.
2. Comment on the behavior of all the models you have built so far
3. Cluster stores using sales and customer visits as features. Find out how many clusters or groups are possible. Also
visualize the results.
4. Is it possible to have separate prediction models for each cluster? Compare results with the previous models.
Applying ANN:
5. Use ANN (Artificial Neural Network) to predict Store Sales.

 Fine-tune number of layers,

 Number of Neurons in each layers.

 Experiment in batch-size.

 Experiment with number of epochs. Carefully observe the loss and accuracy? What are the observations?

 Play with different Learning Rate variants of Gradient Descent like Adam, SGD, RMS-prop.

 Which activation performs best for this use case and why?

 Check how it performed in the dataset, calculate RMSE.


6. Use Dropout for ANN and find the optimum number of clusters (clusters formed considering the features: sales and
customer visits). Compare model performance with traditional ML based prediction models.
7. Find the best setting of neural net that minimizes the loss and can predict the sales best. Use techniques like Grid
search, cross-validation and Random search.

Download the data sets from here

You might also like