0% found this document useful (0 votes)
24 views11 pages

Utilizing Microsoft Excel to model data in treating cancer - Nguyễn Đàm Xuân Nguyên

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Utilizing Microsoft Excel to model data in treating cancer - Nguyễn Đàm Xuân Nguyên

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Page 1

Researcher: Nguyễn Đàm Xuân Nguyên

Instructor: Dr. Dương

Applied Mathematics

14/10/2024

Utilizing Microsoft Excel to model data in treating cancer

Abstract

Microsoft Excel is a good tool for basic data analysis in cancer, especially with the

growing need for advanced healthcare data modeling. In this research, the half-maximal

inhibitory concentration (defined as IC50) of cancer cell activity in response to α-mangostin and

a similar drug is modeled using Excel. The methodology comprised applying varying amounts of

the chemicals to cancer cells in cytotoxicity experiments and using Excel capabilities like

regression analysis and ANOVA hypothesis testing to analyze the collected data. The findings

demonstrated that α-mangostin was more effective than the other chemical tested at preventing

the proliferation of cancer cells, with statistically significant differences established (F-ratio =

113.5, p-value = 0.0004399). Excel was shown to be a useful tool for preliminary data modeling,

but it may be inadequate for analysis beyond that. Notwithstanding these limitations, Excel can

be a helpful tool for medical practitioners who don't have access to specialized software. Future

updates to Excel might enhance its use in medical research and make it a more potent instrument

in oncology.
Page 2

Introduction

There is a growing demand for an extensive range of healthcare data models as healthcare

companies aspire to become technology-driven, data-rich enterprises. By making efficient use of

these healthcare data models, the health system may operate at peak efficiency and discover new

avenues for improving patient outcomes (Maria, 2023). One of the most effective programs for

data modelling and analysis is Microsoft Excel. This software allows data to be swiftly and

simply stored, arranged, altered, and analyzed by users. Its adaptability is useful for a variety of

intricate computations, including predicting trends and seasonal fluctuations. Additionally, Excel

has sophisticated visuals like graphs and charts that make it possible for users to comprehend

enormous volumes of data faster and more precisely than previously. Hence, it is a viable tool in

oncology as it allows excellent data modelling for drug creations, improvements for radiation

therapy, etc.

This research paper demonstrates how Microsoft Excel can be implemented to predict the

half-maximal inhibitory concentration (which is scientifically defined by the term IC50), which

is then used to compare and conclude the effectiveness of different substances.

Literature Review

Cancer researchers frequently employ sophisticated statistical software, such as R and

SPSS, to model intricate datasets. Treatment outcome prediction has also been achieved with the

use of machine learning techniques. Nevertheless, not all medical practitioners possess the

expertise needed to use these technologies.


Page 3

Numerous studies have been carried out to investigate the usefulness of Microsoft Excel

in general data analysis. One such study, by Dr. M. Dinesh Kumar (2023), examined statistical

functions required for research and concentrated on approaches to carry out such analysis. There

is, however, a dearth of studies on the application of Excel data assessment in medical research,

apart from a guide by Russell, G. (2019) on the topic titled "Microsoft Excel as a data extraction

tool for audit of cancer minimum datasets." This study further assesses Excel's ability to model

cancer data.

Methodology

The dataset used in this study contains cancer cell activity.

CellTiter 96® AQueous One Solution Cell Proliferation Assay (MTS) was used to find

the potency of α-mangostin. This was carried out using a complete medium: cancer cells were

cultivated at a density of 5000 cells/well on a 96-well plate. After a day of incubation, the cells

were injected with α-mangostin in varying doses (0, 0.5, 5, 10, 15, and 25 μg/mL) and incubated

at 37°C for three days. The MTS cytotoxicity test kit [3-(4,5-dimethylthiazol-2-yl)-5-(3-

carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium] was used to measure the activity and

growth of cells. The treated C6 cell wells were filled with the MTS reagent, and they were left to

develop for three hours at 37°C. The cells in the reagent break down the PMS (phenazine

methosulfate) to create a formazan product that is soluble in cell culture medium and has

maximum absorption at 490 nm (measured with a Thermo Science™ Multiskan™ GO

spectrophotometer). This was repeated twice to ensure reliability and improve data precision.
Page 4

The second substance is a randomly selected compound capable of fighting cancer cells.

To collect data, identical methods were applied; however, the concentrations of the added

substance were varied as follows: 0, 1, 5, 10, 20, and 40 μg/mL.

Data Analysis

Before conducting any analysis of the results, any outlier data must be removed to ensure

accuracy. Additionally, the use of Excel functions with caution ensures that no arithmetic

miscalculations happen, which will affect data integrity. The independent variable, dependent

variable, and control value are determined as follows:

Independent variable: Concentration of each substance;

Dependent variable: Activity of cancer cells;

Control value: Activity of cancer cells when there are no substances.

Excel functions were employed for statistical analysis, such as:

- MODE, MEDIAN, and AVERAGE to represent central tendencies.

- STDEV.P and VAR.P as variability measurements.

- Utilizing pivot tables to summarize data.

- Regression analysis with built-in “Trendline” function for scatter plot diagrams.

- F.DIST.RT to determine the p-value, essential for one-way ANOVA hypothesis

testing.
Page 5

By utilizing these functions, we are ready to calculate the IC50 value of each substance.

First, consider any acquired results from a sample, which contains the cell activities for each

concentration value of the added substance. To find the value of 50% cancer cell activity,

calculate the arithmetic mean of the control index (no cancer cells) and the cell activity with no

additional substance added. Then, by selecting all the cell activity values with the substance’s

corresponding concentrations in that sample, a scatter plot graph can be created. To create a

regression line, inspect the newly constructed graph by selecting it, clicking the “+” button on

the top-right corner, enabling “Trendline”, and selecting “More Options” to display the equation.

Finally, by utilizing the regression equation, substitute the 50% cancer cell activity index into the

equation and find the concentration. The result is the IC50 of that sample. Repeat for other

samples and find the mean value of IC50 for each substance.

To test whether if α-mangostin has a greater potency than the other selected compound, a

one-way analysis of variance (ANOVA) hypothesis test is used. Assuming that the potency of

each substance follows a normal distribution, the distributions of both substances have the same

variance, and the data are independent, the null hypothesis and alternative hypothesis are defined

as follows:

H0: the cytotoxic activities of two substances are the same;

H1: there is a significant difference in cytotoxic activity between two substances.

After calculating the necessary values for One-way ANOVA, calculate the p-value using

the Excel function “=F.DIST.RT(F-Ratio, Difference between groups, Difference within


Page 6

group)”. If the p-value is less than 5%, reject H0: for 5% significance level, there is evidence that

the cytotoxic activity between two substances is different. Otherwise, accept H0.

Results

Refer to Appendices for the obtained dataset, generated regressions, and the observed

results after experiment.

The regression model indicated a statistically significant difference between the “base”

substance and α-mangostin, with F-ratio = 113.5 and p-value = 0.0004399 < 0.05 (both values

are rounded to 4 significant figures). This gave evidence to conclude that α-mangostin showed a

stronger potency compared to the other used drug. As a result, using this substance is a more

effective choice to inhibit the growth of cancer cells.

However, the recorded results indicated large uncertainties, especially for all measured

values of the three tested samples of the unknown mixture at 5% concentration being abnormally

disproportionate. Microsoft Excel also reflected this high variance in the regressions, indicated

by an average R2 value of 0.6947. As a result, methodological errors must be identified and

corrected in order to improve data precision.

With regard to the modeling, Excel is restricted when it comes to sophisticated modeling

approaches like machine learning. Nonetheless, the outcomes aligned with the findings of other

investigations using more advanced instruments.


Page 7

Discussion

The evaluated results indicated strong evidence to support that there is a great difference

between the potencies of two substances; specifically, α-mangostin is shown to be more effective

in fighting cancer cells than the other tested alternative. If such experiments are carried out in the

future, the data obtained would be necessary data in finding the optical treatment in oncology.

Through the experiment and analysis, Excel proved to be a useful tool for preliminary

data analysis. However, despite the great versatility and conveniency it can provide, Microsoft

Excel is still inferior to other popular statistical analysis softwares like SPSS or MatLab. Sai

Kalyan Kalluri has indicated that Excel has limitations in handling large datasets and complex

models, lacks specialized functions or algorithms for scientific data evaluation, and lacks

automation features, which may cause errors and require a lot of time for manual input.

Conclusion

For oncology and for scientific researches in general, Microsoft Excel is a helpful tool

when it comes to working with reasonable data sizes. If it goes beyond this, however, it is best to

use more advanced algorithms and software to ensure accuracy and validity. Future

advancements in Excel's functionality may improve its application in medical research.


Page 8

References

1. Kalluri, S.K (2023) “Using Excel for data analysis has its advantages and disadvantages.”

[Online]. Available at: https://medium.com/@kalyankalluri2207/using-excel-for-data-

analysis-has-its-advantages-and-disadvantages-80b71b1dc295 (Accessed 10 October

2024)

2. Kumar, M.D (2023) “A Study on Importance of Microsoft Excel Data Analysis Statistical

Tools in Research Works” [Online]. Available at:

https://www.researchgate.net/publication/381412852_A_Study_on_Importance_of_Micr

osoft_Excel_Data_Analysis_Statistical_Tools_in_Research_Works (Accessed 12

October 2024)

3. Maria (2023) “Must-have Healthcare Data Models for Your Health System” [Online].

Available at: https://www.314e.com/healthcare-data-analytics/blog/must-have-

healthcare-data-models-for-your-health-system/ (Accessed 22 September 2024)

4. Russell G. (2019) “Microsoft Excel as a data extraction tool for audit of cancer minimum

datasets: a brief how-to guide” [Online]. Available at:

https://www.rcpath.org/static/daaeea41-bdbc-4c87-815950f8628a7842/Excel-as-a-data-

extraction-tool-how-to.pdf (Accessed 12 October 2024)


Page 9

Appendices

Measurement results

Concentration (%) 40 20 10 5 1 0 NO CC

Unknown mixture, sample 1 0.3992 0.3746 0.8516 1.3814 0.3729 0.7281 0.3815

Unknown mixture, sample 2 0.3807 0.4863 1.0422 1.6724 1.0407 1.3530 0.3490

Unknown mixture, sample 3 0.4030 0.5723 0.9071 1.9188 1.5197 1.3981 0.3651

Mean value 0.3943 0.4777 0.9336 1.6575 0.9778 1.1597 0.3652

Standard deviation 0.0097 0.0809 0.0800 0.2196 0.4703 0.3058 0.0133

Concentration (%) 25 15 10 5 0.5 0 NO CC

α-mangostin, sample 1 0.4150 0.3499 0.3395 1.0335 2.2412 2.0108 0.3572

α-mangostin, sample 2 0.3888 0.3688 0.3428 0.9895 2.3938 1.7429 0.3798

α-mangostin, sample 3 0.4046 0.3204 0.4814 0.8075 2.1307 1.6337 0.3846

Mean value 0.4028 0.3464 0.3879 0.9435 2.2552 1.7958 0.3739

Standard Deviation: 0.0108 0.0199 0.0661 0.0978 0.1079 0.1584 0.0119

Figure 01. Table results representing obtained data

Note: “NO CC” indicates the measured value of a test tube containing no cancer cells.
Page 10

Figure 02: Linear regressions of data

Note: The linear regressions were obtained after removing all outliers from the data set.
Page 11

Figure 03: Observed results from practical experiment

Note: The first 3 rows are observed results related to the unknown mixture; The 3 rows below

are observed results related to α-mangostin.

You might also like