Dutta 2017
Dutta 2017
in Children
Sushama Rani Dutta1, Soumyajit Giri2, Sujoy Datta3, Monideepa Roy4
School of Computer Engineering,
KIIT University,
Bhubaneswar, Odisha, India
In this system we have collected data from multiple sensors Probability of Support of is:
and smart phones for the primary investigation input for an Support ( ) = Number of time ‘ ’ experienced.
autistic patient. We have considered 50 predefined Autism We considered the rule for minimal support with highest
symptoms from which, any symptom can be present or absent confidence condition for selecting the probable additional
for the patient. On the basis of that initial symptom, other symptoms with the AR.
additional symptoms can be automatically compiled and a list We have used the minimum-Redundancy-Maximum-
of additional symptoms can be made to check whether the Relevance (mRMR) method to find the most appropriate
37
mutually independent symptoms which have not yet been iteration of the mRMR calculation. For disease prediction, we
examined. The output symptoms of mRMR (a) can identify if have predefined 6 different Autism cases using classifiers.
the symptom belongs to the class of disease (b) must be We have set the medium and high conditions of the chance of
mutually uncorrelated at the particular time. It derives the the disease. If the chance of having the disease is 40% then
mutual information between the symptoms. This mRMR this is identified as medium and 80% is identified as a high
method is used along with Mutual Information Difference chance of the disease. If the chance of the disease occurring is
(MID) in our proposed work. below 40% then it is not considered as the probable case of
the disease, and if the chance is above 80% then it is
B. The minimum Redundancy Maximum relevance (mRMR) considered as a highly probable case. If any condition holds
based symptom selection between 40% and 80% of chances i.e. between medium and
high chance of disease then the next step is followed. Again
we will try to select at least one more additional symptom to
The non-linear relationships between an experienced and
make a more reliable prediction of the disease. This condition
a yet-to-be experienced symptom’s theoretic ranking was
is called ‘grey zone’. The prediction of the disease under this
taken into consideration. We found that the minimum
condition is not confident. Then the Highest Information
Redundancy Maximum Relevance (mRMR) [11] method, can
Gain (HIG) method is applied to choose the additional
be used to choose the optimal symptom for a class. The
symptom to find the most confident case. According to HIG,
mutual information between target Y and the individual
the training data set is prepared by recalculating the values
feature should be maximized while searching for the
and providing a reweighted data set. So the gray zone
maximum relevant symptoms. Let D(S, Y) be the mean of the
symptoms are assigned with higher weightages. If any
mutual information between individual symptom and target
condition is still in the gray zone then the procedure
Y. The equation for the above statement is:
terminates and selects the high probability symptom sets for
1) max D( S,Y ) = the diagnosis. This will identify the most probable disease.
The symptom sets stored in the database are predefined by
If two symptoms X and Y are correlated, then they may the domain expert or the pattern that have previously
not necessarily belong to the same class of disease. They may occurred in past patients.
cause two different classes of diseases. So domain experts For example the autism patient symptoms sets
should not include two correlated symptoms in a symptom abbreviated by the following letters for the diseases are:
set. If the two symptoms are highly correlated then we do not d1, d2, -------dn.
include these symptoms in the same symptom set. For the Let the symptoms set of the diseases are
minimum redundancy criteria, the selection of symptoms is in d1={G,P,C,A,D,W,H},d2={R,M,K,O,F,T},---
such a way that, they mutually show maximum dissimilarity. ,d7={C,R,M,G},--- ,dn={D,H,R,F,K,I,V}.
Let R(S,Y ) be the mean of the mutual information According to our proposed technique we started
between pairs of symptoms in S. The equation for the above diagnosis with 2 preliminary symptoms of a patient i.e. ‘C’
statement is: and ‘R’ and finally got the symptom set {C,R,M,G} of
disease d7 which is already mentioned in the database.
2) min R(S,Y ) = =
Fig.2 shows the representation of the highest probability of G
The combination of both the above statements is called the by C, R, M. Fig.3 uses the AR, mRMR and MID to pull the
minimal-redundancy-Maximal-relevance (mRMR). If we associated symptoms with our patient input i.e. ‘C’and ‘R’.
simultaneously minimize R(S,Y) and maximize D(S, Y) then According to our example the support of ‘C’ is 13. ‘R’ is 15
the mRMR feature will be obtained. The mRMR technique and ‘M is 12.
combines two features to form a single criterion function at a
time. The equation for the Mutual Information Difference
(MID) criterion is as follows:
38
Probability of conf(C,R→G) =0.02
Probability of conf(C,R→P) =0.04
Hence C,R satisfies the minimum redundancy with maximum
relevant method with the ‘M’ symptom of any unknown
disease not yet found. Now C,R pulls the symptom ‘M’ by
following the above technique. The process terminates if the
set of symptoms {C,R,M} is found in the database. If they are
not found then the mutual information of C,R,M has to be
calculated by applying the above three AR,MRMR and MID
methods. We found that:
Probability of conf(C,R,M→G) =0.6
Probability of conf(C,R,M→A) =0.04,
Hence C,R,M pull the next associative symptom
‘G’, where the MID equation is maximized and checks for
the disease from the database with the symptom set
{C,R,M,G}. Fig.2 represents the highest support of G by C,
R, M. According to the Association rule it was found that the
disease d7 has the symptom set {C,R,M,G}.Hence we
predicted that the disease is d7. The technique helps to
identify more number of symptoms according to the input
pattern and pattern present in the database. If any symptom
set has problem of confidence level, then it will follow HIG
method to get a confident symptom set for the disease
prediction.
39
input as the patient stress level while trying to communicate symptom is negative .So the further pulling of symptom
i,e. with communication disorder and finally predicted the based on the positive nature of the symptoms. Fig.5.
disease Intellectual disability. The arrow mark (→) indicates represents the graphical presentation of comparison of the
the tested symptom is positive and × mark indicates the above cases.
TABLE I. Part of database used for Autism Disease diagnosis (Domain expert decision)
40
IMTC 2005 – Instrumentation and Measurement REFERENCES
Technology Conference
Ottawa, Canada, 17-19 May 2005 [1] Salomon, Joshua A., et al. "Healthy life expectancy for 187 countries,
1990–2010: a systematic analysis for the Global Burden Disease Study
2010." The Lancet 380.9859 (2013): 2144-2162.
[2] Apiletti, Daniele, et al. "Real-time analysis of physiological data to
support medical applications." IEEE transactions on information
V. CONCLUSION AND FUTURE WORK technology in biomedicine 13.3 (2009): 313-321.
[3] “Diagnostic and statistical manual of mental disorders (5th Ed.),”
In Autism Disease diagnosis we have seen that the highest Washington, DC: American Psychiatric Association, 2013
[4] M. Helt, E. Kelley, M. Kinsbourne, J. Pandey, H. Boorstein, M.
accuracy is 83% using Machine learning Association Rule with Herbert, and D. Fein, “Can children with autism recover? if so, how?,”
minimum Redundancy Maximum Relevance (mRMR) method. Neuropsychology Review, vol. 18, pp. 339–366, December 2008
As we observed in the actual diagnosis of Pervasive [5] Cheng, Yi-Ting, et al. "Mining Sequential Risk Patterns From Large-
Developmental Disorder (PDD), the accuracy is 85%, but we Scale Clinical Databases for Early Assessment of Chronic Diseases: A
Case Study on Chronic Obstructive Pulmonary Disease." IEEE Journal
got 70% with our proposed algorithm. This is due to some of Biomedical and Health Informatics 21.2 (2017): 303-311.
missing data and problems in capturing data. (e.g. when we [6] Perego, Paolo, et al. "Reach and throw movement analysis with support
took the readings of the child he might not have exhibited some vector machines in early diagnosis of autism." Engineering in Medicine
of the symptoms, but in the actual case diagnosis time he may and Biology Society, 2009. EMBC 2009. Annual International
Conference of the IEEE. IEEE, 2009.
exhibit some of the symptoms). We have used the Mutual [7] Begg, Rezaul, and Joarder Kamruzzaman. "A machine learning
Information Difference (MID) method for selecting the approach for automated recognition of movement patterns using basic,
additional symptoms which can strengthen the diagnosis kinetic and kinematic gait data." Journal of biomechanics 38.3 (2005):
decision further. This system helps the doctors to diagnose the 401-408.
[8] Wu, Jianning, Jue Wang, and Li Liu. "Feature extraction via KPCA for
disease easily by seeing any one or two preliminary symptoms. classification of gait patterns." Human movement science 26.3 (2007):
Our future work is to reduce the time of finding the appropriate 393-411.
symptoms set and predict the disease more accurately. [9] Nehme, B., et al. "Developing a skin conductance device for early
Autism Spectrum Disorder diagnosis." Biomedical Engineering
(MECBME), 2016 3rd Middle East Conference on. IEEE, 2016.
ACKNOWLEDGEMENT [10] Yu, Lei, and Huan Liu. "Efficient feature selection via analysis of
relevance and redundancy." Journal of machine learning research 5.Oct
This work has been carried out with the support and funding of (2004): 1205-1224.
ITRA Media Lab Asia and DeitY through the project “Remote [11] Peng, Hanchuan, Fuhui Long, and Chris Ding. "Feature selection based
on mutual information criteria of max-dependency, max-relevance, and
Health: A Framework for Healthcare Services using Mobile and min-redundancy." IEEE Transactions on pattern analysis and machine
Sensor-Cloud Technologies”. intelligence27.8 (2005): 1226-1238.
41