SignExplainer An Explainable AI-Enabled Framework For Sign Language Recognition With Ensemble Learning
SignExplainer An Explainable AI-Enabled Framework For Sign Language Recognition With Ensemble Learning
Saudi Arabia
3 Department of Computer Science and Engineering, School of Engineering and Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat 382007,
India
4 Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
ABSTRACT Deep learning has significantly aided current advancements in artificial intelligence. Deep
learning techniques have significantly outperformed more than typical machine learning approaches,
in various fields like Computer Vision, Natural Language Processing (NLP), Robotics Science, and
Human-Computer Interaction (HCI). Deep learning models are ineffective in outlining their fundamental
mechanism. That’s the reason the deep learning model mainly consider as Black-Box. To establish confi-
dence and responsibility, deep learning applications need to explain the model’s decision in addition to the
prediction of results. The explainable AI (XAI) research has created methods that offer these interpretations
for already trained neural networks. It’s highly recommended for computer vision tasks relevant to medical
science, defense system, and many more. The proposed study is associated with XAI for Sign Language
Recognition. The methodology uses an attention-based ensemble learning approach to create a prediction
model more accurate. The proposed methodology used ResNet50 with the Self Attention model to design
ensemble learning architecture. The proposed ensemble learning approach has achieved remarkable accuracy
at 98.20%. In interpreting ensemble learning prediction, the author has proposed SignExplainer to explain
the relevancy (in percentage) of predicted results. SignExplainer has illustrated excellent results, compared
to other conventional Explainable AI models reported in state of the art.
INDEX TERMS Deep learning, computer vision, explainable AI, SignExplainer, classification, sign
language, technological development.
the availability of exclusive computing resources and a huge For medical domain tasks like Sign language recognition,
amount of learning dataset, deep learning can generate much it is necessary to explain and relive the internal learning
more accurate results than before. With the good performance pattern. If the internal learning patent is correct, then it will
of machine learning and deep learning, artificial intelligence increase trust in sign language recognition models. However,
can achieve superhuman abilities. The world’s social environ- this explanation also provides misclassification error, leading
ment will transform dramatically due to artificial intelligence to improvisation in the model or input scenario. Trust values
over the use of different platforms. These changes come with are much more essential for sign language recognition to pre-
various ethical issues, which society will need to quickly dict how the model will learn a given gesture-based sign [8].
adjust to influence the advances in a way that will lead to pos- The interpretability improves the methodology to predict the
itive consequences. The complexity of deep learning models actual label. Because the generation of sign gestures may vary
allows artificial intelligence to learn and react over complex from person to person, in that case, there is a high possibility
data structures. Computer vision is one of the best approaches to recognize a different label. Sign language recognition with
for image classification, segmentation, object detection, and Explainable AI helps to improve the recognition model with
many more applications [3]. various expectations, and also help the end user to understand
Deep learning models prove excellent performances in sen- the learning methodology of the deep learning model to
sitive areas like medical science, national defense, automa- recognize different sign gestures [9].
tion driving, finance, and many more, but these applications A sign language recognition system helps physically
also need attention to trust-related problems. A system having impaired people to communicate with the rest of the world.
promising results but with good interpretation is easier to People having hearing impairment use gesture-based signs
trust [4]. The significant performance of computer vision task to express their emotions and thoughts. The majority of the
generates a huge number of parameters and links with the contribution to generating a sign is a hand gesture, but to
physical environment, which is extremely hard to explain. express proper meaning it will involve other non-manual
This complex learning structure generally considers as a body parts like the orientation of the head, the direction of
‘‘Black-Box’’ [5]. Since, the advancement of deep learning, eyes, eyebrows, and lips moment. XAI for sign language
especially computer vision in sensitive and critical sectors, recognition helps to understand the predicted result, which
the issue of transparency and interpretability is highly recom- may lead to improved accuracy of the model as well as users
mended. It’s necessary to involve explainability in Artificial also get familiar with the generated ideal gesture of sign.
Intelligence generally referred to as Explainable Artificial Computer vision-based sign language recognition systems
Intelligence (XAI). A rapidly expanding field of study, XAI not only improve in terms of accuracy but also improve user
is quickly emerging as one of the more important compo- trust [10].
nents of Artificial Intelligence (AI) [6]. Research over XAI This study proposed a threefold main contribution.
in the context of computer vision aims to extract or try to • First, Attention-based ensemble learning for sign lan-
interpret the structure inside the black box. Additionally, guage recognition.
it provides trust and interpretability to assist bias-free debug- • Second, the authors have introduced novel architecture
ging over different computer vision applications like object using XAI for Sign language recognition.
detection, classification, and others. Interpretation from XAI • Finally, illustrate concrete evidence for interpretability
models explains potential design flow or structures [7]. and decision-driven approach of the proposed method-
Figure 1 represents a functional comparison of AI and XAI, ology with Explainable AI.
especially for reaction over predicted results by black box The rest of the article is designed as section II illustrates the
learning. recently published methodology for sign language recogni-
tion and XAI. Section III demonstrate the proposed method-
ology with deep learning and XAI. Section IV represents
the simulation process and demonstrates the explainability
and interpretability of the proposed architecture. Section V
illustrates the evaluation and results discussions.
how much of a zebra prediction is influenced by the presence other standard machine learning models like Design Tree,
of stripes. We explain how CAVs may be used to evaluate Gradient Boost, Support vector machine, Random Forest,
predictions and generate knowledge for a standard image and Ada Boost. The proposed methodology has achieved
classification network and a medical application, putting con- 81% remarkable accuracy. The author has also considered
cepts to the test in image categorization. the lack of transparency issue for Machine Learning models.
In this research [12], authors describe a unique technique To determine the significance of the characteristics of the
that offers contrasting justifications for categorizing an input predicted result, Local Interpretable Model-agnostic Expla-
by a deep neural network or another black box classifier. nations (LIME) are used. The author has demonstrated the
Given an input, we find what needs to be simply and ade- different available particles in water like Chloramines, Tur-
quately present (viz. important object pixels in an image) bidity, Sulfate, and many more to justify results with Explain-
to justify its classification and analogously, along with that able AI, the proposed LIME model utilize to generate a result
minimally and necessarily absent (viz. certain background with the percentage of water particles.
pixels) for the same. We contend that such explanations are Vermeire et al. [16] proposed a model-agnostic model
typical in fields like criminology and health care because ‘‘Search for EviDence Counterfactual’’ (SEDC) for image
they are natural to people. A key aspect of an explanation classification. The ‘‘EdC’’ explanation is an irreducible col-
that, to our knowledge, has not yet been formally identified lection of characteristics that, if absent, would change the
by current explanation methods used to explain neural net- classification of the document. The SEDC additionally sup-
work predictions is minimally represented but critically not ports a single task for image explanation. The proposed
present. The authors have validated the proposed method- methodology used image segmentation as a core component
ology over three datasets obtained from diverse domains; to interpret. The authors have the simulated model to com-
a brain activity strength dataset, a large procurement fraud pare different counterfactual classes and also compare with
dataset, and a handwritten digits dataset MNIST. In all three standard explainer models like SHAP and LIME. Simulation
cases, we observe the effectiveness of our method in produc- has used pre-train weights of MobileNet V2 to demonstrate
ing precise explanations that are also simple for specialists to the interpretation of the proposed SEDC model.
comprehend and evaluate. [12]. Goel et al. [17], a proposed technique to design ‘‘coun-
Akula et al. [13], proposed the CoCoX model to explain the terfactual explanations’’. Generally, it is used to justify by
prediction generated by CNN classification. The author has content area of the image, through the model that made the
proposed a fault-line model to identify minimum segmented- prediction. The methodology also encountered the problem
level features. Explanation from the CoCoX model was of Minimum-Edit Counterfactual. A methodology work on
understandable to the technical and non-technical communi- input image trained by a computer vision model, to inter-
ties. The author has evaluated qualitative matrices like Justifi- pret the predicted class. The methodology used the MNIST
cation Trust (JT), and Explanation Satisfaction (ES) to make dataset over the CNN model achieved 98.40% accuracy.
performance understandable. The author has also compared The proposed training model has 2 convolutions and 2 FC
the fault line model to other state-of-the-art models like LIME (Fully connected) layers to generate a feature size of 4 ×
and LRP [13], author has successfully achieved 69.1 JT with 4×40. To generalize counterfactual explanations, the author
CNN learning and Fault-Line Identification. has also experimented with Omniglot and Caltech-UCSD
Contreras et al. [14], design Deep Explainer and Rule Birds dataset. Proposed technique working over Greedy
Extraction (DEXiRE), to make binary neural networks Sequential Exhaustive Search model. The author has summa-
explainable. The proposed methodology uses rule extraction, rized the qualitative and quantitative results of the proposed
which improves knowledge extraction from DL model (CNN) technique.
output. A final (global) rule set describing the general behav- Arras et al. [18], proposed a framework that provides,
ior of DL predictors can be created by integrating intermedi- a controlled, selective, and realistic testbed for the prediction
ate rule sets explaining the behavior of each concealed layer. of deep neural networks. The proposed methodology uses
They used BCWD, Banknote, and Prima diabetes datasets the CLEVR-XAI dataset for simulation, there were around
for the simulation of the proposed DEXiRE model. The 140k questions in the CLEVR-XAI evaluation set. With
number of words in the intermediate and final rule sets 28 alternative solutions. The prediction issue is presented as
may be regulated precisely with DEXiRE. The rule Extrac- a classification challenge. The author has used ten polling
tion model has achieved remarkable accuracy and fidelity techniques to visualize the explanation evaluation over a
0.94 and 0.95 respectively in a very small amount of time round truth mask. The experiment section summarized the
(around 232 ms). evaluation of different XAI methods like Guided Backprop,
Patel et al. [15] water Potability prediction synthetic over- LRP, SmoothGrad, and other 7 methods [18]. The conclusive
sampling technique and Explainable AI. The author has study finds that LRP performed much better compared to
used Synthetic Minority Oversampling Technique (SMOTE) another method over the proposed (CLEVR-XAI) benchmark
method to classify water quality on the Kaggle dataset. The dataset. Table 1 represent comparative analysis over different
author has also compared the proposed architecture with explainable model to predict result by black-box learning,
analysis also represents a statistical comparison to justify features. Especially, when the task was related to computer
trust and confidence. vision, proper model training is necessary. The proposed
methodology used ensemble learning with an attention
TABLE 1. Comparative analysis of state-of-the-art Explainable AI model model. Figure 3 represents an ensemble attention-based
overconfidence and justified trust value.
model for sign language recognition. The proposed method-
ology uses a bagging-based ensemble model to learn the
associated feature of sign images. Attention-based Ensem-
ble learning mainly divides into two categories, multi-
head ensemble and attention-based ensemble [23]. Figure 3
demonstrate the different way of attention-based ensemble
learning. Algorithm 1 represents the architectural structure of
the proposed ensemble learning approach with the bagging
concept.
C. SIGNEXPLAINER
Interpretation and explainable techniques involved with
three-dimension blob channel to recognize input images in an
black-box deep learning models fall under two categories,
RGB channel. The attention feature and convolution feature
model specific or agnostic. This section focuses on the design
are associated with the final feature vector generation and
of SignExplainer an agnostic interpretability technique, that
it was forwarded to a fully connected DCNN network for
can be applied to any black-box deep-learning model to
classification. Figure 4 represents the conceptual architecture
interpret gesture-based signs. SHAP [29] is among the most
representation of the proposed ensemble learning with the
utilized interpretability methods for deep learning-based
attention model.
methods. SHAP can construct interpretations for multi-class
X
G (x) = F (x) ⊗ A(x) (5) classifier responses. SignExplainer uses Sign-specific Xcon-
cept to generate a fault line explanation. Let’s assume that
δpred and δalt can be Xconcept for Ealt and Ealt respectively
B. CLASSIFICATION AND PREDICTION where E stands for the actual class. Based on Xconcept, line
The output from the fully connected layer is further pro- prediction can be calculated as equation 11 [30].
cesses for classification and prediction. The authors have
9(E pred , Ealt ←
) min α δpred , δalt + β |δpred | + λ ||δalt ||
implemented multi-layer perceptron (MLP) [25] to classify δpred ,δalt
sign language. The proposed methodology uses DFFN (Deep (11)
Forward Neural Network) to recognize gesture signs from
The proposed Methodology designs DeepExplainer as an
input images. ReLU activation was implemented in the final
additive feature attribution method with accuracy and miss-
layer of the deep network for sign recognition, and it can
ingness. DeepExplainer combines the SHAP value computed
be calculated as equation (6), where (W1 , W2 ) are different
for a smaller component of the ensemble network and calcu-
weights and (b1 , b2 ) as bias.
lates it as equation 12, [31]. Where, o = f (x) − f (r) and
DFNN = ReLU (W1x + b1 ) W2 + b2 (6) xi = xi − xr , ris the reference input, while f (x) is the model
output,
The authors have utilized NumPy and Scikit-learn [26] Xn
for evaluation and visualization. The class-wise performance O= Cxi ∗ 1o (12)
i=1
IV. EXPERIMENTS AND RESULT The proposed ensemble methodology has achieved 98.20 %
A. DATASET accuracy with extracted features from attention and the
The authors have evaluated SignExplainer with ensemble ResNet50 model. Model training was divided with 0.2 train-
learning on Indian Sign Language Dataset [32]. The dataset test split ratios (80:20) for all experiments, with an image
used for simulation consists of 36 Indian Sign classes having size of (72, 72, 3) and a batch size of 16. The model was
digits (0-9) and an alphabet (A-Z). The dataset consists of simulated with 0.3 as a dropout ratio and a 0.001 learning
approximately 1200 images per class, with 3 channel images. rate with the Adam optimizer. Table 3 demonstrate superior
Along with Indian Sign Language (ISL) dataset, the authors performance over other standard Convolution networks, addi-
have also experimented with other static datasets like Amer- tionally, the best performance was observed by the proposed
ican Sign Language (ASL) [33], and Bangla Sign Language Attention-based ensemble model. The proposed method-
(BSL) [34]. Property of datasets described in Table 2. ology has achieved significant accuracy over 50 learning
epochs, as shown in Figure 6.
TABLE 2. Statistical representation of different sign language datasets
used in the simulation.
B. DATA AUGMENTATION
The proposed simulation uses data augmentation to make the
model more generalized for feature learning. Data augmen-
tation is also used to balance training image samples and
improve robustness for learning variability over the different
images, making the model more generalized toward real-
time scenarios. Direct image inference may yield biased find-
ings due to particular transformations and noise associated
with equipment and surroundings. Image augmentation must
be used to achieve more reliable and robust prediction to
improve accuracy and prevent overfitting. The authors have
implemented i) Geometric transformations as random hori-
zontal flip, random rotation with +0.2 to -0.2, and zooming
by 1.5% to 2.5%. ii) Color space transformations as random
RGB change and Brightness by 0.5%. Figure 5 represents the
sample of the augmented training dataset.
FIGURE 6. Accuracy and loss curve for Indian Sign Language recognition
using Attention-based Ensemble learning.
TABLE 3. Performance analysis with state-of-the-art models for Image TABLE 5. Performance analysis of SignExplainer over different static Sign
classificatio. Language Datasets.
FIGURE 8. Representation of SignExplainer to interpret sign gesture with prediction value and class (class stars from 0-9 in left to right).
VI. CONCLUSION tem for smart television users,’’ Sustainability, vol. 15, no. 3, p. 2206,
The era of Explainable AI growing exponentially, to over- Jan. 2023.
[7] M. Baldeon Calisto and S. K. Lai-Yuen, ‘‘AdaEn-net: An ensemble
come trust and transparency issues of deep learning models. of adaptive 2D–3D fully convolutional networks for medical image
Especially tasks relevant to Computer vision or NLP must segmentation,’’ Neural Netw., vol. 126, pp. 76–94, Jun. 2020, doi:
require interpreting predicted results over critical sectors. 10.1016/j.neunet.2020.03.007.
The review has explored different XAl methodologies like [8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
‘‘DeepLab: Semantic image segmentation with deep convolutional nets,
LRP, LIME, SHAP, and SmoothGrad over relevant com- atrous convolution, and fully connected CRFs,’’ IEEE Trans. Pat-
puter vision applications. This study has proposed Sign Lan- tern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018, doi:
guage Recognition to make explainable artificial intelligence. 10.1109/TPAMI.2017.2699184.
[9] J. Ganesan, A. T. Azar, S. Alsenan, N. A. Kamal, B. Qureshi, and
Ensemble learning-based architecture was proposed to recog-
A. E. Hassanien, ‘‘Deep learning reader for visually impaired,’’ Electron-
nize sign gestures from sign images. Ensemble weights were ics, vol. 11, no. 20, p. 3335, Oct. 2022.
passed to the proposed SignExplainer to generate statistical [10] D. Kothadiya, C. Bhatt, K. Sapariya, K. Patel, A.-B. Gil-González, and
values like TP-rate and FP-rate, to evaluate the correctness J. M. Corchado, ‘‘Deepsign: Sign language detection and recognition
using deep learning,’’ Electronics, vol. 11, no. 11, p. 1780, Jun. 2022, doi:
of the proposed SignExplainer. This study also evaluated 10.3390/electronics11111780.
ensemble learning with another deep learning model for [11] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas,
image classification. The proposed study also evaluates the and R. Sayres, ‘‘Interpretability beyond feature attribution: Quantita-
performance of SignExplainer over other benchmark static tive testing with concept activation vectors (TCAV),’’ in Proc. Int.
Conf. Mach. Learn., Mar. 2023, pp. 2668–2677. [Online]. Available:
sign language datasets like ASL and BSL, and it also achieves http://proceedings.mlr.press/v80/kim18d.html
remarkable performance. The proposed study also simulates [12] A. Dhurandhar, P.-Y. Chen, R. Luss, C.-C. Tu, P. Ting, K. Shanmugam,
additional machine learning and deep learning models like and P. Das, ‘‘Explanations based on the missing: Towards contrastive
explanations with pertinent negatives,’’ 2018, arXiv:1802.07623.
Decision tree, Random Forest, VGG16, and EfficientNetV2,
[13] A. Akula, S. Wang, and S.-C. Zhu, ‘‘CoCoX: Generating concep-
and evaluates the performance of SignExplainer. Ensemble tual and counterfactual explanations via fault-lines,’’ in Proc. AAAI
learning and other deep learning models were also per- Conf. Artif. Intell., Apr. 2020, vol. 34, no. 3, pp. 2594–2601, doi:
formed well over SignExplainer to interpret predicted signs 10.1609/aaai.v34i03.5643.
[14] V. Contreras, N. Marini, L. Fanda, G. Manzo, Y. Mualla, J.-P. Calbimonte,
with proper statistical values. The proposed work can be M. Schumacher, and D. Calvaresi, ‘‘A DEXiRE for extracting propositional
extended to other static Sign Languages as well as isolated rules from neural networks via binarization,’’ Electronics, vol. 11, no. 24,
Sign Languages. The proposed methodology can be enhanced p. 4171, Dec. 2022, doi: 10.3390/electronics11244171.
for real-time or portable Sign Language Recognition with [15] J. Patel, C. Amipara, T. A. Ahanger, K. Ladhva, R. K. Gupta, H. O. Alsaab,
Y. S. Althobaiti, and R. Ratna, ‘‘A machine learning-based water potability
acceptable interpretations. prediction model by using synthetic minority oversampling technique and
explainable AI,’’ Comput. Intell. Neurosci., vol. 2022, pp. 1–15, Sep. 2022,
ACKNOWLEDGMENT doi: 10.1155/2022/9283293.
This research was funded by Princess Nourah bint Abdulrah- [16] T. Vermeire, D. Brughmans, S. Goethals, R. M. B. de Oliveira, and
D. Martens, ‘‘Explainable image classification with evidence counterfac-
man University and Researchers Supporting Project number tual,’’ Pattern Anal. Appl., vol. 25, no. 2, pp. 315–335, Jan. 2022, doi:
(PNURSP2023R346), Princess Nourah bint Abdulrahman 10.1007/s10044-021-01055-y.
University, Riyadh, Saudi Arabia. The authors would also [17] Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee, ‘‘Coun-
terfactual visual explanations,’’ in Proc. 36th Int. Conf. Mach. Learn.,
like to acknowledge the support of Prince Sultan University May 2019, pp. 2376–2384, Accessed: Mar. 2023. [Online]. Available:
for paying the Article Processing Charges (APC) of this https://proceedings.mlr.press/v97/goyal19a.html
publication. [18] L. Arras, A. Osman, and W. Samek, ‘‘CLEVR-XAI: A benchmark
dataset for the ground truth evaluation of neural network explanations,’’
Inf. Fusion, vol. 81, pp. 14–40, May 2022, doi: 10.1016/j.inffus.2021.
REFERENCES 11.008.
[1] P. P. Angelov, E. A. Soares, R. Jiang, N. I. Arnold, and P. M. Atkinson, [19] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba,
‘‘Explainable artificial intelligence: An analytical review,’’ WIREs Data ‘‘Learning deep features for discriminative localization,’’ in Proc.
Mining Knowl. Discovery, vol. 11, no. 5, p. e1424, 2021. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016,
[2] Y. Yuan and Y. Lo, ‘‘Improving dermoscopic image segmentation with pp. 2921–2929, Accessed: Mar. 6, 2023. [Online]. Available:
enhanced convolutional-deconvolutional networks,’’ IEEE J. Biomed. https://openaccess.thecvf.com/content_cvpr_2016/html/Zhou_Learning
Health Informat., vol. 23, no. 2, pp. 519–526, Mar. 2019, doi: _Deep_Features_CVPR_2016_paper.html
10.1109/jbhi.2017.2787487.
[20] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
[3] A. Gramegna and P. Giudici, ‘‘SHAP and LIME: An evaluation of dis- D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via
criminative power in credit risk,’’ Frontiers Artif. Intell., vol. 4, Sep. 2021, gradient-based localization,’’ Int. J. Comput. Vis., vol. 128, no. 2,
Art. no. 752558. pp. 336–359, Feb. 2020, doi: 10.1007/s11263-019-01228-7.
[4] F. Afza, M. A. Khan, M. Sharif, S. Kadry, G. Manogaran, T. Saba,
I. Ashraf, and R. Damaševičius, ‘‘A framework of human action recogni- [21] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘Why should i trust you?:
tion using length control features fusion and weighted entropy-variances Explaining the predictions of any classifier’’ in Proc. 22nd ACM SIGKDD
based feature selection,’’ Image Vis. Comput., vol. 106, Feb. 2021, Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 1135–1144.
Art. no. 104090. [22] X. Shen, K. Lu, S. Mehta, J. Zhang, W. Liu, J. Fan, and Z. Zha, ‘‘MKEL:
[5] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, ‘‘Explainable AI: Multiple kernel ensemble learning via unified ensemble loss for image
A review of machine learning interpretability methods,’’ Entropy, vol. 23, classification,’’ ACM Trans. Intell. Syst. Technol., vol. 12, no. 4, pp. 1–21,
no. 1, p. 18, Dec. 2020, doi: 10.3390/e23010018. Aug. 2021.
[6] K. V. Dudekula, H. Syed, M. I. M. Basha, S. I. Swamykan, [23] W. Kim, B. Goyal, K. Chawla, J. Lee, and K. Kwon, ‘‘Attention-based
P. P. Kasaraneni, Y. V. P. Kumar, A. Flah, and A. T. Azar, ‘‘Convolu- ensemble for deep metric learning,’’ in Proc. Eur. Conf. Comput. Vis.
tional neural network-based personalized program recommendation sys- (ECCV, 2018, pp. 736–751.
[24] B. Chen and W. Deng, ‘‘Deep embedding learning with adaptive large DEEP R. KOTHADIYA received the bachelor’s
margin N-pair loss for image retrieval and clustering,’’ Pattern Recognit., and master’s degrees in computer science and
vol. 93, pp. 353–364, Sep. 2019, doi: 10.1016/j.patcog.2019.05.011. engineering from Gujarat Technological Univer-
[25] D. R. Kothadiya, C. M. Bhatt, T. Saba, A. Rehman, and S. A. Bahaj, sity. He is currently pursuing the Ph.D. degree
‘‘SIGNFORMER: DeepVision transformer for sign language with the Charotar University of Science and Tech-
recognition,’’ IEEE Access, vol. 11, pp. 4730–4739, 2023, doi: nology (CHARUSAT). He is also an Assistant
10.1109/access.2022.3231130.
Professor with the U & P U Patel Department
[26] J. Mueller and L. Massaron, Python for Data Science. Hoboken, NJ, USA:
of Computer Engineering, Chandubhai S. Patel
Wiley, 2019.
[27] J. Huang, W. Zhou, H. Li, and W. Li, ‘‘Sign language recognition using Institute of Technology, CHARUSAT. He is also
real-sense,’’ in Proc. IEEE China Summit Int. Conf. Signal Inf. Process. a Research Scholar with CHARUSAT, and Prince
(ChinaSIP, Jul. 2015, pp. 166–170. Sultan University, Riyadh, Saudi Arabia. He has already published many
[28] L. Pigou, S. Dieleman, P.-J. Kindermans, and B. Schrauwen, ‘‘Sign lan- research papers, including one SCI-indexed paper. He is also a Technical
guage recognition using convolutional neural networks,’’ in Proc. Eur. Reviewer of International Journal of Computing and Digital Systems.
Conf. Comput. Vis., 2015, pp. 572–578.
[29] S. Knapič, A. Malhi, R. Saluja, and K. Främling, ‘‘Explainable artificial
intelligence for human decision support system in the medical domain,’’
Mach. Learn. Knowl. Extraction, vol. 3, no. 3, pp. 740–770, Sep. 2021,
doi: 10.3390/make3030037. CHINTAN M. BHATT (Member, IEEE) was
[30] J. van der Waa, E. Nieuwburg, A. Cremers, and M. Neerincx, ‘‘Evaluating an Assistant Professor with the CE Department,
XAI: A comparison of rule-based and example-based explanations,’’ Artif. CSPIT, CHARUSAT, for 11 years. He is currently
Intell., vol. 291, Feb. 2021, Art. no. 103404. an Assistant Professor with the Department of
[31] F. Gabbay, S. Bar-Lev, O. Montano, and N. Hadad, ‘‘A LIME-based Computer Science and Engineering (CSE), School
explainable machine learning model for predicting the severity level of of Technology, Pandit Deendayal Energy Univer-
COVID-19 diagnosed patients,’’ Appl. Sci., vol. 11, no. 21, p. 10417, sity (PDEU). He is the author or coauthor of more
Nov. 2021. than 80 publications in the areas of computer
[32] D. R. Kothadiya. (Oct. 2022). Deepkothadiya/STATIC_ISL: Static Indian
vision, the Internet of Things, and fog computing.
Sign Language Dataset Having Sign of Digit and Alphabet. [Online].
He was involved in successful organization of few
Available: https://github.com/DeepKothadiya/Static_ISL
[33] Thakur. (May 2019). American Sign Language Dataset. [Online]. special issues in SCI/Scopus journals. He has won several awards, including
Available: https://www.kaggle.com/datasets/ayuraj/american-sign- the CSI Award and the Best Paper Award for his CSI articles and conference
language-dataset publications.
[34] S. M. Rayeed. (Aug. 2021). Bangla Sign Language Dataset. [Online].
Available: https://www.kaggle.com/datasets/rayeed045/bangla-sign-
language-dataset
[35] T. Saba, M. A. Khan, A. Rehman, and S. L. Marie-Sainte, ‘‘Region
extraction and classification of skin cancer: A heterogeneous framework
AMJAD REHMAN (Senior Member, IEEE)
of deep CNN features fusion and reduction,’’ J. Med. Syst., vol. 43, no. 9, received the Ph.D. and postdoctoral degrees
Jul. 2019, doi: 10.1007/s10916-019-1413-3. (Hons.) from the Faculty of Computing, Universiti
[36] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for Teknologi Malaysia, Johor Bahru, Malaysia, with
large-scale image recognition,’’ 2014, arXiv:1409.1556. a specialization in forensic documents analysis and
[37] B. Li, B. Liu, S. Li, and H. Liu, ‘‘An improved EfficientNet for Rice germ security, in 2010 and 2011, respectively. He is
integrity classification and recognition,’’ Agriculture, vol. 12, no. 6, p. 863, currently a Senior Researcher with the Artificial
Jun. 2022, doi: 10.3390/agriculture12060863. Intelligence and Data Analytics Laboratory, Prince
[38] Y. Heffetz, R. Vainshtein, G. Katz, and L. Rokach, ‘‘DeepLine: Sultan University, Riyadh, Saudi Arabia. He is
AutoML tool for pipelines generation using deep reinforcement also a PI in several funded projects and also com-
learning and hierarchical actions filtering,’’ in Proc. 26th ACM pleted projects funded from MOHE Malaysia, Saud Arabia. His research
SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2020, interests include data mining, health informatics, and pattern recognition.
pp. 2103–2113.
[39] H. Chen, S. Lundberg, and S.-I. Lee, ‘‘Explaining models by propagat-
ing Shapley values of local components,’’ in Explainable AI in Health-
care and Medicine. Cham, Switzerland: Springer, 2020, pp. 261–270.
[Online]. Available: https://link.springer.com/book/10.1007/978-3-030- FATEN S. ALAMRI received the Ph.D. degree in system modeling and
53352-6?page=2#toc, doi: 10.1007/978-3-030-53352-6. analysis in statistics from Virginia Commonwealth University, USA, in 2020.
[40] A. Razaque, M. Ben Haj Frej, M. Almi’ani, M. Alotaibi, and B. Alotaibi, Her Ph.D. research was in Bayesian dose response modeling, experimental
‘‘Improved support vector machine enabled radial basis function and linear
design, and nonparametric modeling. She is currently an Assistant Professor
variants for remote sensing image classification,’’ Sensors, vol. 21, no. 13,
with the Department of Mathematical Sciences, College of Science, Princess
p. 4431, Jun. 2021, doi: 10.3390/s21134431.
[41] Z. Noshad, N. Javaid, T. Saba, Z. Wadud, M. Saleem, M. Alzahrani, and Nourah bint Abdul Rahman University. Her research interests include spatial
O. Sheta, ‘‘Fault detection in wireless sensor networks through the random area, environmental statistics, and brain imaging.
forest classifier,’’ Sensors, vol. 19, no. 7, p. 1568, Apr. 2019.
[42] X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, ‘‘Oriented R-CNN for
object detection,’’ 2021, arXiv:2108.05699.
[43] Y. Liu, ‘‘An improved faster R-CNN for object detection,’’ in Proc.
TANZILA SABA (Senior Member, IEEE) received the Ph.D. degree in
11th Int. Symp. Comput. Intell. Design (ISCID), vol. 2, Dec. 2018,
pp. 119–123.
document information security and management from the Faculty of Com-
[44] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and puting, Universiti Teknologi Malaysia (UTM), Malaysia, in 2012. She is
A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ Proc. Comput. Vis. currently the Associate Chair of Information Systems Department, College
(ECCV), 2016, pp. 21–37. of Computer and Information Sciences, Prince Sultan University, Riyadh,
[45] A. T. Azar, Z. I. Khan, S. U. Amin, and K. M. Fouad, ‘‘Hybrid global Saudi Arabia. Her primary research interests include medical imaging, pat-
optimization algorithm for feature selection,’’ Comput., Mater. Continua, tern recognition, data mining, MRI analysis, and soft-computing.
vol. 74, no. 1, pp. 2021–2037, 2023.