0% found this document useful (0 votes)
11 views10 pages

Covid_Research_Paper

The document presents a research study on predicting COVID-19 cases in India using machine learning techniques such as Support Vector Regressor (SVR) and Long Short-Term Memory (LSTM). It analyzes various factors affecting the spread of the virus and aims to provide insights on future case trends, hospital resource needs, and the impact of interventions like lockdowns. The study utilizes data from multiple sources and employs statistical methods to enhance prediction accuracy and inform public health decisions.

Uploaded by

Ashutosh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

Covid_Research_Paper

The document presents a research study on predicting COVID-19 cases in India using machine learning techniques such as Support Vector Regressor (SVR) and Long Short-Term Memory (LSTM). It analyzes various factors affecting the spread of the virus and aims to provide insights on future case trends, hospital resource needs, and the impact of interventions like lockdowns. The study utilizes data from multiple sources and employs statistical methods to enhance prediction accuracy and inform public health decisions.

Uploaded by

Ashutosh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

prediction on the future cases of covid19

PREDICTION AND ANALYSIS OF COVID19 IN INDIA


USING SVR AND LSTM
1 Rajini Jindal Ashutosh Gupta Anish Shahi
2 Head of Department, CSE B.Tech, Software Engineering, B.Tech, Software Engineering,
Delhi Technological University Delhi Technological University Delhi Technological University
3
[email protected] [email protected] [email protected]
4
5
6
7 Abstract — According to WHO, In India more than 1.2 lacs II. LITERATURE REVIEW
8 are affected due to covid-19 so it’s very helpful to analyze
the data of covid-19. We used machine learning technics 2.1 Traditional model for predicting infectious
9 diseases[1]
10 like SVR and LSTM for analysis. The aim of this research
11 paper is to predict the impact of covid-19 in coming days
and answer some common research questions related to it. Traditional infectious disease prediction models mainly
12 include differential equation prediction models and time
13 First, We analyzed the data, then we added our own
parameters like growth factor based on Total Cases to assess series prediction models based on statistics and random
14 processes. The predictive differential equation models are
15 the situation of different countries and different states. We
used correlation to find the effect of other parameters like designed to create a differential equation that can
16 approximate the Dynamic features of infectious diseases
17 beds, hospitals over growth of coronavirus cases and used
VIF for multicollinearity analysis. Then, We used Support according to population growth characteristics; Disease
18 initiation and transmission laws within the population. Via
19 Vector Regressor with ‘rbf’ kernel to predict the Total
Confirmed Cases, Total Active Cases, Daily Confirmed qualitative and quantitative analysis and numerical
20 simulation of the model dynamics, the incidence mechanism
21 Cases, Daily Cured Cases ,Daily Deaths , Growth Factor in
order to answer research questions related to it. We further of diseases is seen, the transmission laws are identified, the
22
used LSTM to predict the Confirmed cases for the next 10 patterns in change and growth are anticipated, the causes
23
days and main factors of disease transmission are analyzed, the
24
optimum prevention strategies are analyzed and Control is
25
Keywords— COVID19, LSTM, SVR, Prediction , obtained, and people are provided with a theoretical and
26
Correlation quantitative basis for making preventive and control
27
decisions. Popular models for predicting differential
28
I. INTRODUCTION equations of infectious disease dynamics have ordinary
29
differential systems, which directly represent the
30
31 relationship between the individuals' instantaneous rate of
The world Health Organization has declared COVID-19 as change in each compartment and the corresponding time of
32
pandemic. The COVID-19 signs occur within two to 14 days all compartments.
33
of exposure which include nausea, cough, a runny nose which
34
breathing difficulties. In this scenario, a reliable estimation of 2.2 Early Prediction Model Machine learning for
35
36 new Covid19 cases is expected, which can be of assistance to Infectious Disease.[2]
37 the medical and administrative authorities. India is yet to In short, machine learning is about using its own to learn
38 reach the third phase of the COVID-19 epidemic, i.e. the more valuable knowledge from a vast amount of data Adapt
39 population epidemic as seen by different countries around the algorithm to unique problems. Machine learning
40 world, but the cases have been steadily growing.We used encompasses a number of areas, including medicine;
41 machine learning technics to analysis the data and answer Informatics, Statistics, Electronics, Psychology, etc. Neural
42 several research questions using data available. RQ1.) Effect network for example, a Relatively mature machine learning
43 of lockdown on total number of Cases in India? RQ2.) What algorithm, any high-dimensional, nonlinear optimal can be
44 factors can reduce the growth of total number of cases in simulated Mapping between input and output by mimicking
45 India? RQ3.) What would be the total number of COVID19 the nervous biological brain processing mechanism
46 cases in India? RQ4.) When would there be no new Installation. The conventional statistical approach not as
47 COVID19 case in India? RQ5.) When and What would be the successful when dealing with complex data relationships
48 maximum number of COVID19 cases reported in one day? which may not obtain accurate results as the neural network
49 RQ6.) When would all the cases would be cured? RQ7.) because most new infectious diseases that arise in humans
50 When and What would be the maximum number of Active are of animal origin (food infectious diseases), it is an
51 cases reported in one day? RQ8.) When would there be no important prerequisite for predicting diseases by identifying
52 new Active case of COVID19 in India? RQ9.) When and the specific intrinsic characteristics of the species and
53 What would be the maximum number of deaths due to environmental factors that contribute to new pathogens
54 COVID19 in one day? RQ10.) When would there be no new being overflowing. The overall objective of the machine
55 death due to COVID19 in India? learning technique is to extend causal inference theory and
56 machine learning to identify and quantify the most
57 . important factors causing zoonotic disease outbreaks, and to
58 generate visual tools to illustrate the complex causal
59 relationships of animal infectious diseases and their
60 correlation with zoonotic diseases Nevertheless, the
61
62
63
64
65
extremely nonlinear and complex problems to be studied in STEP 1.) DATA COLLECTION
the early prediction model of machine-learning-based
infectious diseases typically result in local minima and The dataset is available at Kaggle and it gets updated on daily
1 global minima, resulting in certain limitations of the bases and information is gathered through API from trusted
2 machine-learning model. sources.
3
4 Following are the datasets available for analysis and
2.3 Internet-based infectious disease prediction model[3] prediction.
5 Internet-based work on infectious disease surveillance has
6 ageGroupDetails.csv which contains information related to
started to increase since the mid-1990s. It will provide the percentage of cases falling under different age group.
7 information resources for agencies of public health
8 covid_19_india.csv which contains information related to
administration, medical professionals and the general the COVID19 cases on daily level basis in each state. This
9 public. This will provide users with early warning and
10 dataset is used to extract useful information.
situational knowledge of infectious diseases after diagnosis IndividualDetails.csv which contains Individual level
11 and treatment Traditional Web page knowledge (such as
12 details.
related news articles, authoritative organizations, etc.) was population_india_census2011.csv which contains
13
the primary source of data in early research. Through the information about Population at state level.
14
growth of the Internet, however, work has begun in recent ICMRTestingDetails.csv which contains information of
15
years to extend data sources into social media (such as Number of COVID-19 tests at daily level.
16
Twitter, Facebook, microblog, etc.) and multimedia content. HospitalBedsIndia.csv which contains information about
17
Thanks to the global reach of the Web, people use Internet Number of hospital beds in each state
18
search engines, social networks and online map resources to
19
track the frequency and location details of question
20
21 keywords, improve the integration of knowledge on
22 financial, public interest and hot issues, carry out search STEP 2.) DATA-PREPROCESSING:
23 engine and social media-based disease tracking, and
24 forecast infectious disease incidences.
25 For statwise grouped data
26 1. We merge the data with the HospitalBedsIndia.csv
III. PROPOSED METHODOLOGY
27 2. We added our own parameter growth factor , which is the
28 ratio of confirmed cases on current day by confirmed cases
29 on previous day.
30 3. We removed outliers based on growth factor for analysis
31 related to growth factor.
32
33
34 For daywise grouped data
35 1. We added parameters derived from the already available
36 parameters such as growth factor, TotalBeds , Total Hospitals
37 , Cured Percentage , Daily Cured Cases , Daily Active Cases
38 , Daily Deaths
39 2. Scaled the data using Z score to make it for SR and LSTM
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
3.) DATA ANALYSIS:

1
2 C. Average growth Factor of Top 20 states
3 A. Total number of Cases till given day
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 Fig 1.14 Total Confirmed, Cured , Deaths till given day.
21 Fig 1.16 : Statewise mean Growth Factor
22
23 Interpretation : Total Cases , Cured Cases , Deaths and
24 Active Cases all increases exponentially as the number of day Interpretation - Manipur has maximum Growth rate , so
25 increases. Total cases are increasing more rapidly In Manipur followed
26 by Tripura , Puducherry , Goa .
27
28
29 B. Total Number of Confirmed , Death and Cured Cases in top
30 20 states (ranked based on total case)
31 D. Cured Percentage of Top 20 states
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46 Fig 1.15: Total, Cured and Death Cases in each state. Fig 1.17 : Statewise Cured Percentage
47
48
49
50 Interpretation : 1.) Maharashtra has maximum number of Interpretation : Punjab has maximum cured percentage till
51 Total, Cured and Death Cases followed by Tamil Nadu, now , i.e. maximum percentage of Confirmed Cases are cured
52 Gujarat and Delhi. 2.) The state with more total cases has followed by Meghalaya , Kerala and Haryana.
53 more cured cases and more deaths , so they are likely to be
54 directly proportional.
55
56
57
58
59
60
61
62
63
64
65
E. Death Percenatge of Top 20 states G. Percentage Change in Confirmed Cases each Day
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 Fig 1.18 : Statewise Death Percentage Fig 1.24: Days wise Percentage Increase in Confirmed Cases
19
20 Interpretation: West Bengal has maximum Death
21 percentage i.e. maximum percent of people died out of Interpretation : It shows though the Confirmed Cases are
22 confirmed cases followed by Meghalaya , Gujarat , Madhya
23 increasing , the percentage increase is decreasing , once it is
Pradesh. zero , there will be no more change in Total Cases and we
24
25 will reach stabilize point.
26 F. Total Beds And Cured Cases in top 20 states
27 H. Growth Factor on each day
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 Fig 1.25 Growth Factor per day
43 Fig 1.19: Statwise beds and Cured Cases
44 Interpretation : it can be obsevered that growth factor has a
45 Interpretaion : Most states shows that with more beds , hype on 4th march,20 because of very small number of cases
46 they have more cured cases showing positive impact of beds , and suddent increase from 6 to 28 , so Data after 5 th march
47 on cured cases like Delhi has more beds and more cured cases is taken for analysis to reject noise.
48 but some states like Karnataka has more beds but still their
49 cured cases are less showing less impact of beds in such
50 states.
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
I. Relationship between Cured Percenatge And Growth STEP 4.) COLLINEARITY ANALYSIS
Factor
Correlation matrix on statewise feature set
1
2
3
4
5
6
7
8
9
10
11
12 Fig 1.22 Correlation Matrix of Predictive Variables
13
14
15 Interpretation : 1.) cured percentage increases as growth
16 Fig 1.26: Daywise Cured percentage and growth factor factor decreases. 2.) Total beds has negative impact on
17 growth factor showing that increasing total beds could help
18 in reducing growth factor thereby reducing growth of
19 confirmed cases but not a very large extent. 3.) As Total
20 hospitals increases , growth factor decreases showing that by
21 Interpretation : we can observe that as growth decreases increasing hospitals , we could reduce growth factor but not
22 with increasing day, cured percentage increases showing to a very large extent but it has better affect than total beds.4.)
23 inverse relationship also found in correlation analysis. Even small correlation of death percent with cured percent ,
24 growth factor , total beds , total hospital is identified showing
25 increasing cured percent , growth factor , beds or hospital
26 J. Relationship between number of cured cases and growth decreases death percent but not to a large extent.
27 factor
28
29
30
31
32 Correlation matrix on day wise feature set
33
34
35
36
37
38
39
40
41
42
43 Fig 1.27 : Cured cases vs Growth Factor per day
44
45
46
47 Interpretation : We can observe that growth factor is
48 roughly decreasing as cured cases increases but is decreasing Fig 1.28 Correlaton matrix on Daywise Distribution of data
49 at a slow rate.
50 Interpretation : We can observe on daily baisis that 1.)
51 growth factor decreases with increasing days. 2.) growth
52 factor decreases as curedcured , death , confimed cases
53 increases also it decreases as cured percentage increases..3.)
54 Cured percentage increases with days , also it increases as
55 cured and confirmed increases. Therefore rate of increase of
56 cured cases is more than rate of increase of confirmed cases.
57 4.) Confirmed cases , cured cases and death are highly
58 correlated with each other.
59
60
61
62
63
64
65
STEP 5.) MULTICOLLINEARITY ANALYSIS growth factor before and after lockdown, so overall decrease
in number of confirmed cases by lockdown
1 Variance Inflation Factor is used to determine the 2. PREDICTION OF ACTIVE CASES ON A GIVEN DAY
2 multicollinearity among independent variables.
3 Predicted active case vs Actual Active case on a give day
4 For statewise Distributed data
5 We found out that Cured, Active, Confirmed are
6 multicollinear to predict growth factor and other left out
7 paramters like TotalBeds , Total Hospitals are not correlated
8 enough to predict growth factor.
9
10 For daywise Distributed data
11 Most variables are independent and not related enough to
12 pedict the solo effect of one on another
13
After Variance inflation Factor Analysis , it was found that
14
Days attribute is enough to predict confirmed , cured , deaths
15
, growth factor , cured percentage
16
17
18
M. METHODOLOGIES USED
19
20
21 We used 2 methodologies
22 1.) SVR to predict long term analysis
23 2.) LSM to predict short-term analysis
Fig 5.4 SVR Test Result Prediction of Active Cases vs predicted
24
25 We used SVR with ‘rbf’ kernel based on the assumption that values on a given day
26 many features follow normal distribution and many features
C.
27 are estimated to follow Gaussian curve based on experience
and anlaysis of data of other countries.We use standard RMSE value = 1003.7184155158901
28
scaling before fitting the model. We use grid search CV for R2 value = 0.997471675345555
29
30 hyperparamter tuning to find the best variables that fit the
31 model on 10-cross validation implicit in gridsearchCV and
32 after which error is calculated on test data along with r2 Predicted Active cases on a given day
33 values which tells how much better our model performed if
34 we would have just taken mean as the predicted value.
35
36
37
38 V. RESULT
39
40 1. LOCKDOWN EFFECT ON GROWTH FACTOR
41
42 Mean growth factor at each lockdown
43
44
45
46
47
48
49
50 Fig 5.6 Predicted active cases on a given day
51
52
53 Conclusion : We can see that maximum number of active
54 ’ cases would be after 159 days and maximum active cases
55 Fig 1.32 Lockdown effect on growth factor would be 113000 after that it would decreasing , No Active
56 Cases would be after around 318 days
57 Interpretation :
58 1.)We can easily see as the lockdown stages increases , the
59 growth factor decreases thereby decreasing the rate of
60 growth of confirmed cases.2.)There is sudden decrease in
61
62
63
64
65
4. PREDICTION OF CONFIRMED CASES PER DAY
3. PREDICTION OF DAILY CASES PER DAY
1 Predicted Daily Case vs Actual Daily Case on a given day
2 Predicted Confirmed Cases vs Actual Confirmed Cases on a given
3 day
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 Fig 5.12 SVR Test Result Prediction of Total Cases vs predicted values
22 on a given day
23 Fig 5.7 SVR Test Result Prediction of Daily Cases vs predicted values on a
24 given day
25 RMSE value = 2015.7928
26
27 RMSE value = 383.3070 R2 score = 0.99644
28
29 R2 score = 0.9498
30 Predicted Confirmed Cases on a given day
31
32
33 Predicted Daily Cases on a given day’
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50 Fig5.10 Predicted daily cases on a given day
51 Fig 5.15: Predicted Total cases on a given day
52 Conclusion: Therefore on 178th day daily case reaches
53 maximum daily case of 21660 approx. after which it starts
54 Conclusion: Therefore, from analysis, we can see the
decreasing. maximum Total Cases at
55
56 197th day and total confirmed cases would be about 439150
57
58
59
60
61
62
63
64
65
5. PREDICTION OF GROWTH FACTOR PER DAY
6. PREDICTION OF DAILY CURED CASES
Predicted growth factor vs Actual Growth Factor on a given
1 day Predicted Daily Cured Case vs Actual Cured Case on a given day
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Fig 5.23 SVR Test Result Prediction of Daily Cured Cases vs predicted
22 Fig 5.17 SVR Test Result Prediction of Growth Factor vs predicted values values on a given day
23 on a given day
24
25 RMSE value = 136.1731
26
27 RMSE value = 0.06134 R2 score = 0.9774
28
29
30 Predicted Growth Factor per day Predicted Daily Cured Cases
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50 Fig 5.20: Predicted Growth Factor on a given day
51 Fig 5.23: Predicted Daily Cured cases on a given day
52 Conclusion: From graph, we can see that growth factor =1
53 Conclusion: From the graph, we can see that on 157th day,
at around 213 days. Growth factor = current confirmed case there are maximum cured cases, i.e.15000 per day. therefore
54 /confirmed case on previous day, therefore, confirmed case
55 after 314 days there would all the case would be cured.
on current day = confirmed case on previous day, i.e. no new
56 confirmed case would be reported after about 213 days.
57 Therefore, situation stabilize after 213 days.
58
59
60
61
62
63
64
65
7. PREDICTION OF DAILY DEATH CASES 8. PREDICTION OF TOTAL CONFIRMED CASES FOR
NEXT 10 DAYS USING LSTM

1
2 1) Predicted Cases of Test Data
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 RMSE value = 608.62
22
23
24 Prediction Of Next 10 Days
25
26
27 Fig 5.24 SVR Test Result Prediction of Daily Death Cases vs predicted
28 values on a given day
29
30
31
32 RMSE value = 314.25
33
34
35 Predicted Daily Deaths
36
37
38
39
40
41
42
43
44
45
46 Fig 5.34 Future Prediction of next 10 days
47
48
49
50
51
52
53
54 Fig 5.27: Predicted Growth Factor on a given day
55
56 Conclusion : from the graph , we can see that there are
57 maximum deaths on a single day of 275 after approx 179
58 days. Therfore ther will be no deaths after 358 days
59
60
61
62
63
64
65
IV. CONCLUSION 824041/who-saysindias-action-oncoronavirus-critical-
for-the-world/ on 25th March 2020
In this paper, we used SVR and LSTM to answer most [5] Myers, J. (2020). India is now the world’s 5th largest
1 common research questions relate to COVID19 in India. economy, World Economic Forum. Accessed from
2 Using SVR based on the assumption of Gaussian Curve ,we https://www.weforum.org/agenda/2020/02/india-gdp-
3 predict the Total number of Cases , Active Cases , Daily economy-growthuk-france/ on 15th March 2020.
4 Confirmed Cases, Daily Deaths , Daily Cured Cases and [6] Gupta, R., & Pal, S. K. (2020). Trend Analysis and
5 using LSTM , We predicted the expected Total Confirmed Forecasting of COVID-19 outbreak in India. medRxiv.
6 Cases for the next 10 days. We also studied the lockdown Accessed from https://www.medrxiv.org/content/
7 effect on growth factor and hence the total number of cases. 10.1101/2020.03.26.20044511v1 on 3rd April 2020 7.
8 Following answers we get of the research questions after Gupta, R., Pandey, G., Chaudhary, P., & Pal, S. K.
9 prediction and analysis. (2020). SEIR and Regression Model based COVID-19
10 outbreak predictions in India. medRxiv. Accessed from
11 https://www.medrxiv.org/content/10.1101
12 RQ1.) Effect of lockdown on total number of Cases in India?
ANS: Growth rate decrease with Lockdown, so rate of /2020.04.01.20049825v1 on 5th April 2020
13
14 growth of Total Cases is decreased due to lockdown
15 RQ2.) What factors can reduce the growth of total number of
16 cases in India? ANS: Total Beds, Total Hospitals
17 RQ3.) What would be the total number of COVID19 cases in
18 India? ANS: 439150 cases (result 4)
19 RQ4.) When would there be no new case of COVID19 in
20 India? ANS: After 213th day (result 5)
21 RQ5.) When and What would be the maximum number of
22 COVID19 cases reported in one day? ANS: 178th day, 21660
23 (result 3)
24 RQ6.) When would all the cases be cured? ANS: After 314
25 days (result 6)
26 RQ7.) When and What would be the maximum number of
27 Active cases reported in one day? ANS: 159th day,
28 113000 (result 2)
29 RQ8.) When would there be no new Active case of COVID19
30 in India? ANS: 318th day (result 2)
31 RQ9.) When and What would be the maximum number of
32 deaths due to COVID19 reported in one day? ANS: 274
33 deaths (result 7)
34 RQ10.) When would there be no new death due to COVID19
35 in India? ANS: After 358 days (result 7)
36
37
38
39
40 V. REFERENCES
41 [1] Role of Machine Learning to Predict the Outbreak of
42
Covid-19 in India- Journal of Xi'an University of
43
Architecture & Technology Volume XII, Issue IV, 2020
44
ISSN No : 1006-7930
45
[2] World Health Organization (2020). Coronavirus disease
46
(COVID-19) Pandemic, WHO. Accessed from
47
https://www.who.int/emergencies/diseases/novel-
48
coronavirus-2019 on 31st March 2020
49
[3] John Hopkins University (2020). Novel Coronavirus
50
51 (COVID-19) Cases, provided by JHU CSSE. Accessed
52 from https://github.com/CSSEGISandData/COVID-19
53 on 6th April 2020
[4] Sharma, N. (2020). India’s swiftness in dealing with
54
55 Covid-19 will decide the world’s future, says WHO,
56 Quartz India. Accessed from https://qz.com/india/1
57
58
59
60
61
62
63
64
65

You might also like