Utilizing Microsoft Excel to model data in treating cancer - Nguyễn Đàm Xuân Nguyên
Utilizing Microsoft Excel to model data in treating cancer - Nguyễn Đàm Xuân Nguyên
Applied Mathematics
14/10/2024
Abstract
Microsoft Excel is a good tool for basic data analysis in cancer, especially with the
growing need for advanced healthcare data modeling. In this research, the half-maximal
inhibitory concentration (defined as IC50) of cancer cell activity in response to α-mangostin and
a similar drug is modeled using Excel. The methodology comprised applying varying amounts of
the chemicals to cancer cells in cytotoxicity experiments and using Excel capabilities like
regression analysis and ANOVA hypothesis testing to analyze the collected data. The findings
demonstrated that α-mangostin was more effective than the other chemical tested at preventing
the proliferation of cancer cells, with statistically significant differences established (F-ratio =
113.5, p-value = 0.0004399). Excel was shown to be a useful tool for preliminary data modeling,
but it may be inadequate for analysis beyond that. Notwithstanding these limitations, Excel can
be a helpful tool for medical practitioners who don't have access to specialized software. Future
updates to Excel might enhance its use in medical research and make it a more potent instrument
in oncology.
Page 2
Introduction
There is a growing demand for an extensive range of healthcare data models as healthcare
these healthcare data models, the health system may operate at peak efficiency and discover new
avenues for improving patient outcomes (Maria, 2023). One of the most effective programs for
data modelling and analysis is Microsoft Excel. This software allows data to be swiftly and
simply stored, arranged, altered, and analyzed by users. Its adaptability is useful for a variety of
intricate computations, including predicting trends and seasonal fluctuations. Additionally, Excel
has sophisticated visuals like graphs and charts that make it possible for users to comprehend
enormous volumes of data faster and more precisely than previously. Hence, it is a viable tool in
oncology as it allows excellent data modelling for drug creations, improvements for radiation
therapy, etc.
This research paper demonstrates how Microsoft Excel can be implemented to predict the
half-maximal inhibitory concentration (which is scientifically defined by the term IC50), which
Literature Review
SPSS, to model intricate datasets. Treatment outcome prediction has also been achieved with the
use of machine learning techniques. Nevertheless, not all medical practitioners possess the
Numerous studies have been carried out to investigate the usefulness of Microsoft Excel
in general data analysis. One such study, by Dr. M. Dinesh Kumar (2023), examined statistical
functions required for research and concentrated on approaches to carry out such analysis. There
is, however, a dearth of studies on the application of Excel data assessment in medical research,
apart from a guide by Russell, G. (2019) on the topic titled "Microsoft Excel as a data extraction
tool for audit of cancer minimum datasets." This study further assesses Excel's ability to model
cancer data.
Methodology
CellTiter 96® AQueous One Solution Cell Proliferation Assay (MTS) was used to find
the potency of α-mangostin. This was carried out using a complete medium: cancer cells were
cultivated at a density of 5000 cells/well on a 96-well plate. After a day of incubation, the cells
were injected with α-mangostin in varying doses (0, 0.5, 5, 10, 15, and 25 μg/mL) and incubated
at 37°C for three days. The MTS cytotoxicity test kit [3-(4,5-dimethylthiazol-2-yl)-5-(3-
growth of cells. The treated C6 cell wells were filled with the MTS reagent, and they were left to
develop for three hours at 37°C. The cells in the reagent break down the PMS (phenazine
methosulfate) to create a formazan product that is soluble in cell culture medium and has
spectrophotometer). This was repeated twice to ensure reliability and improve data precision.
Page 4
The second substance is a randomly selected compound capable of fighting cancer cells.
To collect data, identical methods were applied; however, the concentrations of the added
Data Analysis
Before conducting any analysis of the results, any outlier data must be removed to ensure
accuracy. Additionally, the use of Excel functions with caution ensures that no arithmetic
miscalculations happen, which will affect data integrity. The independent variable, dependent
- Regression analysis with built-in “Trendline” function for scatter plot diagrams.
testing.
Page 5
By utilizing these functions, we are ready to calculate the IC50 value of each substance.
First, consider any acquired results from a sample, which contains the cell activities for each
concentration value of the added substance. To find the value of 50% cancer cell activity,
calculate the arithmetic mean of the control index (no cancer cells) and the cell activity with no
additional substance added. Then, by selecting all the cell activity values with the substance’s
corresponding concentrations in that sample, a scatter plot graph can be created. To create a
regression line, inspect the newly constructed graph by selecting it, clicking the “+” button on
the top-right corner, enabling “Trendline”, and selecting “More Options” to display the equation.
Finally, by utilizing the regression equation, substitute the 50% cancer cell activity index into the
equation and find the concentration. The result is the IC50 of that sample. Repeat for other
samples and find the mean value of IC50 for each substance.
To test whether if α-mangostin has a greater potency than the other selected compound, a
one-way analysis of variance (ANOVA) hypothesis test is used. Assuming that the potency of
each substance follows a normal distribution, the distributions of both substances have the same
variance, and the data are independent, the null hypothesis and alternative hypothesis are defined
as follows:
After calculating the necessary values for One-way ANOVA, calculate the p-value using
group)”. If the p-value is less than 5%, reject H0: for 5% significance level, there is evidence that
the cytotoxic activity between two substances is different. Otherwise, accept H0.
Results
Refer to Appendices for the obtained dataset, generated regressions, and the observed
The regression model indicated a statistically significant difference between the “base”
substance and α-mangostin, with F-ratio = 113.5 and p-value = 0.0004399 < 0.05 (both values
are rounded to 4 significant figures). This gave evidence to conclude that α-mangostin showed a
stronger potency compared to the other used drug. As a result, using this substance is a more
However, the recorded results indicated large uncertainties, especially for all measured
values of the three tested samples of the unknown mixture at 5% concentration being abnormally
disproportionate. Microsoft Excel also reflected this high variance in the regressions, indicated
With regard to the modeling, Excel is restricted when it comes to sophisticated modeling
approaches like machine learning. Nonetheless, the outcomes aligned with the findings of other
Discussion
The evaluated results indicated strong evidence to support that there is a great difference
between the potencies of two substances; specifically, α-mangostin is shown to be more effective
in fighting cancer cells than the other tested alternative. If such experiments are carried out in the
future, the data obtained would be necessary data in finding the optical treatment in oncology.
Through the experiment and analysis, Excel proved to be a useful tool for preliminary
data analysis. However, despite the great versatility and conveniency it can provide, Microsoft
Excel is still inferior to other popular statistical analysis softwares like SPSS or MatLab. Sai
Kalyan Kalluri has indicated that Excel has limitations in handling large datasets and complex
models, lacks specialized functions or algorithms for scientific data evaluation, and lacks
automation features, which may cause errors and require a lot of time for manual input.
Conclusion
For oncology and for scientific researches in general, Microsoft Excel is a helpful tool
when it comes to working with reasonable data sizes. If it goes beyond this, however, it is best to
use more advanced algorithms and software to ensure accuracy and validity. Future
References
1. Kalluri, S.K (2023) “Using Excel for data analysis has its advantages and disadvantages.”
2024)
2. Kumar, M.D (2023) “A Study on Importance of Microsoft Excel Data Analysis Statistical
https://www.researchgate.net/publication/381412852_A_Study_on_Importance_of_Micr
osoft_Excel_Data_Analysis_Statistical_Tools_in_Research_Works (Accessed 12
October 2024)
3. Maria (2023) “Must-have Healthcare Data Models for Your Health System” [Online].
4. Russell G. (2019) “Microsoft Excel as a data extraction tool for audit of cancer minimum
https://www.rcpath.org/static/daaeea41-bdbc-4c87-815950f8628a7842/Excel-as-a-data-
Appendices
Measurement results
Concentration (%) 40 20 10 5 1 0 NO CC
Unknown mixture, sample 1 0.3992 0.3746 0.8516 1.3814 0.3729 0.7281 0.3815
Unknown mixture, sample 2 0.3807 0.4863 1.0422 1.6724 1.0407 1.3530 0.3490
Unknown mixture, sample 3 0.4030 0.5723 0.9071 1.9188 1.5197 1.3981 0.3651
Note: “NO CC” indicates the measured value of a test tube containing no cancer cells.
Page 10
Note: The linear regressions were obtained after removing all outliers from the data set.
Page 11
Note: The first 3 rows are observed results related to the unknown mixture; The 3 rows below