0% found this document useful (0 votes)
26 views

Chapter 1 Tumor

This document discusses exploratory data analysis and preprocessing techniques for a multiple sclerosis dataset containing brain MRI images. It explores features of the MS images like dimensions and modalities. Statistics like mean, standard deviation, and correlations between modalities are calculated. Preprocessing steps explored include histogram stretching, equalization, and filtering to reduce noise while preserving features. Histogram analysis can identify bimodal distributions that may indicate MS lesions. The goal is segmentation of MS lesions from brain images.

Uploaded by

Vivek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Chapter 1 Tumor

This document discusses exploratory data analysis and preprocessing techniques for a multiple sclerosis dataset containing brain MRI images. It explores features of the MS images like dimensions and modalities. Statistics like mean, standard deviation, and correlations between modalities are calculated. Preprocessing steps explored include histogram stretching, equalization, and filtering to reduce noise while preserving features. Histogram analysis can identify bimodal distributions that may indicate MS lesions. The goal is segmentation of MS lesions from brain images.

Uploaded by

Vivek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

I.

CHAPTER 1: Multiple Sclerosis (MS) Dataset


I. Exploratory Data Analysis (EDA):

a) Loading data:
Multiple Sclerosis (MS): This dataset contains brain MRI images of patients with multiple
sclerosis. The goal of segmentation is to identify multiple sclerosis lesions on images. SEP
dataset on ScienceDirect

b) Exploring Image Features:

1-LesionSeg-T1.nii: Lesion segmentation image based on T1 data. This image has a size of
(512, 512, 19), indicating that it is three-dimensional with a resolution of 512 pixels in the
first dimension, 512 pixels in the second dimension, and 19 sections in the third dimension.
1-LesionSeg-Flair.nii: Lesion segmentation image based on Flair data. The size of this image is
(256, 256, 23).
1-Flair.nii: Raw Flair image with no segmentation. Its size is (256, 256, 23), indicating that it
shares the same resolution as the Flair-based segmentation image.
1-T1.nii: Raw image based on T1 data with a size of (512, 512, 19).
1-T2.nii: Raw image based on T2 data with a size of (256, 256, 19).
1-LesionSeg-T2.nii: Lesion segmentation image based on T2 data, also with a size of (256,
256, 19).
Image sizes indicate the number of pixels in each dimension, and the third dimension
represents the number of sections (thickness of the image stack).
These data are intended for the analysis and segmentation of lesions in the context of
multiple sclerosis, a neurological disease. The different imaging modalities (Flair, T1, T2) are
used to obtain additional information about the structure and pathology of the brain in the
context of diseases such as MS.
c) Image Viewing:

View a representative sample of images from each category to get a visual idea of plant
disease variations in the dataset.
d) Data Statistics:

These statistics describe the properties of pixel intensities for Flair images of two different patients.

Mean— The average represents the average value of pixel intensities in the image. For Patient-1-Flair,
the average is about 119.92, and for Patient-2-Flair, it is about 52.90.
Std Dev— The standard deviation measures the dispersion of pixel intensities from the mean. The
higher the standard deviation, the more dispersed the values are. For Patient-1-Flair, the standard
deviation is approximately 183.16, and for Patient-2-Flair, it is approximately 111.13.

Median—The median is the median value of pixel intensities. It is less sensitive to extreme values
than average. For Patient-1-Flair, the median is 22.0, and for Patient-2-Flair, it is 4.0.

Min— The minimum value represents the smallest pixel intensity in the image. For Patient-1-Flair, the
minimum value is 0.0, and for Patient-2-Flair, it is also 0.0.

Max— The maximum value represents the highest pixel intensity in the image. For Patient-1-Flair, the
maximum value is 1396.0, and for Patient-2-Flair, it is 748.0.

Correlation :
These correlation values indicate the linear relationship between the pixel intensities of
different imaging modalities for two different patients.
For Patient-1:
The correlation between the Flair and T1 images is 0.0715, indicating a very weak positive
correlation.
The correlation between the Flair and T2 images is 0.0784, also indicating a very weak
positive correlation.
The correlation between T1 and T2 images is -0.0288, indicating a weak negative correlation.
The correlation between T1 and T2 images is 0.1743, indicating a weak to moderate positive
correlation.
In summary, the correlations are generally weak, suggesting that the pixel intensities
between the different image modalities are not strongly linearly related. This can be
important in the medical context, as different imaging modalities can provide additional
information about anatomical structures or pathologies.
II. Methodology:
i. Image pre-processing:
a) Contrast Enhancement:
Histogram Stretching Techniques
This code adapts the function to process images of 5 patients and displays the original
images as well as images after the histogram has been stretched.

The histogram in the image shows the distribution of MS data values before and after
stretching. On the horizontal axis, we find the values of the data. On the vertical axis, we find
the number of data with the corresponding value.
Before stretching, we can see that the values are unevenly distributed. The most common
values are between 0 and 10.
After stretching, we can see that the values are now more evenly distributed. Values are now
between 0 and 1.
Stretching therefore allowed the values to be distributed more evenly. This can be useful for
applications such as data visualization or machine learning.
So the stretching made it possible to highlight the differences between the values of the
data. The most common values are now between 0.2 and 0.8.
If the most common values are between 0.2 and 0.8, this could indicate that people with
these data values have a higher risk of developing the disease.
Logarithmic and exponential transformations

The logarithmic transformation revealed the differences between the values of the data. The
most common values are now between 0.01 and 0.1.
This can be helpful in identifying data values that are associated with a higher risk of MS. For
example, if the most common values are between 0.01 and 0.1, this could indicate that
people with these data values have a higher risk of developing the disease.
The exponential transformation has made it possible to highlight the most extreme values of
the data. The highest values are now clearly visible.
This can be useful in identifying data values that are associated with severe MS cases. For
example, if the higher values are between 10 and 100, it could indicate that people with
these data values have a higher risk of developing serious complications from the disease.
b) Histogram Equalization:
Classic Histogram Equalization

The original histogram (in blue) shows that the data are unevenly distributed, with a
concentration of values in the range of 0 to 10. The histogram after histogram equalization
(in purple) shows that the data is now more evenly distributed, with a range of values from 0
to 1.
Histogram equalization is a signal processing technique that aims to make the distribution of
values in a signal more uniform. This can be useful for improving data visualization or for
facilitating the application of machine learning techniques.
In the specific case of the MS dataset, histogram equalization can be useful to identify data
values that are associated with a higher risk of developing the disease. For example, if the
most common values after histogram equalization are between 0.2 and 0.8, this could
indicate that people with these data values have a higher risk of developing MS.
Adaptive EQ

Adaptive equalization may be more effective than conventional histogram equalization in


identifying data values that are associated with a higher risk of developing the disease. This is
because adaptive equalization makes it possible to take into account local variations in the
distribution of data values.
c) Noise Elimination:
Application of spatial filters such as medium filter, median filter, and Gaussian filter to
reduce noise in medical and agricultural images without significantly altering the relevant
features.

The results displayed show the effect of three spatial filters


Mean Filtered:
This image represents the average of the pixel intensities across the slices of the image. It is
supposed to reduce noise and highlight important structures. However, it can also lead to
loss of detail.
Median Filtered (Filtre médian) :
This image results from applying a median filter with a 3x3 window. The median filter is
effective in reducing impulsive noise while preserving contours. It is often used to eliminate
extreme pixels.
Gaussian Filtered (Filtere gaussien) :
This image is obtained by applying a Gaussian filter with a standard deviation (sigma) of 1.
The Gaussian filter is used to smooth out images and reduce noise. It is often chosen for its
ability to retain contours while reducing noise.

 In Patient Image 1, the medium filter reduces noise in gray matter and white matter
areas. This makes the edges between these two regions sharper and easier to
identify.
 In patient image 2, the median filter reduces noise in a focal lesion. This makes the
lesion more visible and easier to measure.
 In Patient Image 3, the Gaussian filter smooths out both noise and fine detail. This
gives a more uniform appearance to the image, but can also make it more difficult to
identify lesions.

Use of non-linear filtering techniques such as two-sided filtering to attenuate noise while
preserving important edges and edges.

In patient image 1, the two-sided filter reduces noise in gray matter and white matter areas.
This makes the edges between these two regions sharper and easier to identify.

In patient image 2, the two-sided filter reduces noise in a focal lesion. This makes the lesion
more visible and easier to measure.
d) Histogram Analysis:
Study of peaks and valleys

For a normal brain, the histogram of a Flair MRI image is usually unimodal, with a peak at
around 50-70. That's because the majority of brain tissue is made up of white matter, which
has a higher signal intensity on Flair images than gray matter.

In MS patients, the histogram of a Flair MRI image may be bimodal, with a second peak at a
higher signal strength value. This is because MS lesions have a higher signal strength on Flair
images than normal brain tissue.

Statistical Modeling of the Histogram

 Patient 1: Patient 1's histogram is unimodal, with a peak at approximately 60. This
indicates that the patient does not have MS lesions.
 Patient 2: Patient 2's histogram is bimodal, with a second peak at about 100. This
indicates that the patient has MS lesions. The amplitude of the second peak is
relatively small, suggesting that the patient has moderate MS.
 Patient 3: Patient 3's histogram is bimodal, with a second peak at approximately 120.
This indicates that the patient has MS lesions. The amplitude of the second peak is
very high, suggesting that the patient has severe MS.

The Gaussian model is a statistical function that describes the distribution of pixel intensity
values in the image.

Optimizing thresholds for contrast and brightness

The function adjusts the contrast and brightness of the image, then limits the values to the
interval [0.255]. The arguments for the lens function are the initial brightness factor and the
image, and the bounds for the parameters are (0.5, 1.5) for the contrast factor and (-50, 50)
for the brightness factor.
The original image is dark and hard to see. The adjusted image is brighter and more visible.
The contrast factor used to adjust the image is 1.25 and the brightness factor used to adjust
the image is 20.
ii. Segmentation :

The original Flair MRI image shows several areas of abnormal signal intensity that suggest MS
lesions. The segmented regions identified by the Felzenszwalb algorithm correspond to areas
of anomalous signal strength. The contours of the segmented regions provide additional
information about the shape and size of MS lesions.
a) Intensity-Based Segmentation (Thresholding)

The diagram illustrates the intensity-based segmentation process for a particular patient. The
original Flair MRI image (top left) shows several areas of abnormal signal intensity, suggesting
the presence of MS lesions.
The grayscale image (top center) shows a bimodal distribution of intensity, with one peak
corresponding to MS lesion intensity values and another peak corresponding to normal brain
tissue intensity values.
The Otsu threshold (top right) is located between the two peaks, indicating that the
intensity-based segmentation algorithm was able to effectively separate MS lesions from
normal brain tissue.
The binary image (bottom left) shows MS lesions as white areas on a black background.
b) Region Growing Segmentation

The seed point is selected in the center of the image The growing region starts from the
starting point and gradually expands to include neighboring pixels that have similar intensity
values.
The segmented image shows MS lesions as distinct regions.

c) Segmentation by Clustering:
Segmentation by clustering (k-means) is applied to the FLAIR image. MS lesions are grouped
into a separate cluster, represented by the color green in the segmented image.
d) Color-Based Segmentation:

Color-based segmentation is applied to the FLAIR image. MS lesions are represented by a


distinct color in the segmented image. In this case, MS lesions are well identified and
separated from normal brain tissue. However, it is possible that the segmentation may not
be perfect, especially in areas where MS lesions are small or poorly defined.

iii. Evaluation :

Accuracy
Accuracy is a measure of the overall accuracy of a model. It is defined as the number of
correct predictions divided by the total number of predictions. In this case, the accuracy is
0.4975, which means that the model correctly predicted the class of 49.75% of the pixels.

Jaccard Index

The Jaccard Index is a measure of the similarity between two sets. It is defined as the
number of correctly classified pixels divided by the sum of the number of correctly classified
pixels and the number of misclassified pixels. In this case, the Jaccard Index is 0.3294, which
means that the model correctly classified 32.94% of the pixels that were supposed to be
classified in the same class.

Precision

Accuracy is a measure of the proportion of pixels correctly classified in the predicted class. It
is defined as the number of correctly classified pixels in the predicted class divided by the
number of predicted pixels in that class. In this case, the accuracy is 0.4954, which means
that the model correctly predicted 49.54% of the pixels that were supposed to be classified
in the predicted class.

Recall

Recall is a measure of the proportion of correctly classified pixels in the actual class. It is
defined as the number of correctly classified pixels in the actual class divided by the number
of actual pixels in that class. In this case, the recall is 0.4957, which means that the model
correctly predicted 49.57% of the pixels that were supposed to be classified in the actual
class.

F1 Score

The F1 Score is a measure of accuracy and recall combined. It is defined as the harmonic
mean of precision and recall. In this case, the F1 Score is 0.4956, which means that the
model achieved an average accuracy and recall of 49.56%.

Confusion Matrix

The confusion matrix is a table that shows the model's predictions relative to actual classes.
In this case, the confusion matrix is as follows:

[[16430 16476]
[16454 16176]]

The confounding matrix shows that the model correctly predicted 16,430 pixels that were
supposed to be classified as MS lesions and 16,476 pixels that were supposed to be classified
as normal brain tissue. The model also mispredicted 16,454 pixels that were supposed to be
classified as MS lesions and 16,176 pixels that were supposed to be classified as normal brain
tissue.

General interpretation of the results

The results of the SEP dataset show that the model has an average accuracy and recall of
49.56%. This means that the model correctly predicted 49.56% of the pixels that were
supposed to be classified in the correct class.

You might also like