0% found this document useful (0 votes)
8 views

Machinelearning

This document outlines a final task to develop methods for predicting the stage of disease in patients using medical data, with an emphasis on estimating uncertainty in predictions. Two datasets are provided: 1) tabular data on 418 patients with cirrhosis to predict disease stage, and 2) 317 chest X-ray images to predict normal vs. COVID-19/pneumonia cases. The goal is to analyze and develop classification methods for prediction while considering computational complexity, precision in different feature spaces, and producing informative uncertainty estimates. Results must be summarized in a report explaining the analysis and proposed strategies, with citations of sources and a link to an implementation script shared on Google Colab.

Uploaded by

ayesha awan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Machinelearning

This document outlines a final task to develop methods for predicting the stage of disease in patients using medical data, with an emphasis on estimating uncertainty in predictions. Two datasets are provided: 1) tabular data on 418 patients with cirrhosis to predict disease stage, and 2) 317 chest X-ray images to predict normal vs. COVID-19/pneumonia cases. The goal is to analyze and develop classification methods for prediction while considering computational complexity, precision in different feature spaces, and producing informative uncertainty estimates. Results must be summarized in a report explaining the analysis and proposed strategies, with citations of sources and a link to an implementation script shared on Google Colab.

Uploaded by

ayesha awan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

INTRODUCTION
An accurate estimate of uncertainties in machine learning predictions is
paramount to building reliable models. It allows to make better informed
decisions, to identify outliers, as well as detect anomalies in the data or to
interpret the results more easily. A major challenge for the deployment of
these systems in critical applications (such as medical diagnostics, self driving
vehicles, etc.) is to identify when and to what extent the system may fail a
prediction. After all, an evaluation of uncertainty is built-in into our
behaviour. For example, a human driver will slow down in case of a
significant amount of uncertainty.
When we turn to regression problems, in certain families of models the task
can be easily accomplished through theoretical results. The most obvious
example in this respect is (obviously) linear regression. In general, models
are much more complicated and the complex interaction within the
algorithms make almost hopeless, even in regressive problems, to derive a
reasonable theoretical analysis. In classification problems, often, the output of
an algorithm is the probability distribution over all possible classes, assessing
the likelihood of each class. The problem appears to be completely solved, but
in fact uncertainty is actually moved on the values of the outcome
probabilities.

2. GoALS
Your task is to explore possible ways of producing a measure or an estimate
of uncertainty in classification predictions. You are not restricted to using
a single classification method, and you can, in principle, develop different
assessments of uncertainty for different algorithms. When developing your
method(s), try to consider
the computational complexity of your method,
the precision of different regions of the feature space,
that the results should be as much informative as possible.

3. DAtA
Data are of medical interest. The reason is that this is the kind of framework
in which it is of greatest importance to have a clear and reliable assessment of
the uncertainty in the prediction.
The first dataset is only tabular, and it has been taken from:
Date: June 15, 2023.
1
2 FINAL TASK FOR ANALISI DEI DATI

https://www.kaggle.com/datasets/fedesoriano/cirrhosis-prediction-dataset

It contains data about 418 patients with biliary cirrhosis of the liver. The preli-
minary goal is to analyse extensively methods that will provide accurate pre-
dictions of the histologic stage of disease. The main step is then to develop, for
one or more of the methods analysed, an assessment of the uncertainty of the
prediction.
A second dataset contains images, and as such is more computationally
complicated. It can be find here:
https://www.kaggle.com/datasets/pranavraikokte/covid19-image-dataset

It contains 317 chest X-rays images of normal patients, and of patients


with Covid-19 or viral pneumonia. The goal is the same of previous
dataset, na- mely develop methods for prediction and an estimate of the
uncertainty in the prediction. This task is computationally more intensive, and
for this reason the analysis of this dataset is optional, for those that want to
have fun with a mo- re computationally challenging problem. The dataset is
divided into training and test, but you can consider the whole data. Notice
that while convolutional neural networks can be considered a standard tool
in classification based on images, they require some computational power1,
and they might not be the only possibility available.

4. SUBMISSION
The results of your own analysis and ideas must be summarised in a report
which explains how you have planned to tackle the problem and the possible
strategies you have tried to solve the problem. The emphasis is not on the
performances of the final method(s) proposed, but on the way you have dealt
with the problem.
You are not only allowed but actually encouraged to read up on the subject.
In order to be complete and fair, you are required to cite all sources of research
material you have used (books, scientific papers, etc.).
This final assignment is a personal piece of work and must not be done in
groups. Discussions with colleagues or experts, although discouraged, should
be reported for fairness.
Your report can be uploaded on the e-learning website. The deadline is
August 5, 2023.
You should add, at the end of your report, the link to a script (R or Python)
containing the implementation of the final method(s) proposed, based on the
analysis developed. The script must be shared via a notebook onGoogle Colab.
Obviously the script must not contain any errors. Please add a link to the
notebook in your report.

1
In case, you may consider to downscale pictures.
FINAL TASK FOR ANALISI DEI DATI 3

It is not necessary (and in fact useless) for the script to contain the entire
analysis. The recommendation is that the output of your scripts will be a
detailed account of your conclusions. The numbers, without any explanation
about their meaning, are not really helpful.

You might also like