0% found this document useful (0 votes)
2 views1 page

Poster Garciacarrasco

Uploaded by

zakari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views1 page

Poster Garciacarrasco

Uploaded by

zakari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

A Requirements-driven methodology aligned with the Model-Driven Architecture (MDA) for Data Analytics

over Big Data sources using AI


Jorge García Carrasco, Alejandro Maté, Juan Carlos Trujillo
University of Alicante
Department of Software and Computing Systems
Lucentia Research

Background On the other hand, an additional study regarding image generation via DL models
was also performed. Specifically, the study focused on the use of two types of
With the emergence of Big Data [2], the amount of data available for companies techniques, namely Transfer Learning (TL)[6] and Data Augmentation (DA)[3], and
and individuals has been dramatically increased. Extracting value from such large its effect on the task of image generation via Generative Adversarial Networks
and complex amount of data is not an easy task. This situation has led to an in- (GAN)[1] when training such GAN with an extremely small dataset. Several
creasing interest on the use of Machine Learning (ML) and Deep Learning (DL) examples of images synthesized by the GAN are shown in Fig. 1. The study led to
techniques [5]. However, despite the interest of industry and academia for the use the following conclusions:
of such techniques, there is a lack of methods that ease the capture of require- The use of DA enabled us to train GANs with an extremely low number of
ments and allow to efficiently address the development of an AI-based project. samples (∼ 103 samples) compared to the typical required samples (∼ 105 − 106
samples).
Objectives The use of TL allowed the network to converge much faster, as well to slightly
improve the quality of the results.
The main objective of this thesis is to propose a methodology for developing
AI-based solutions over Big Data sources, that helps on the capture of require- This study shows the potential of GANs when combined with DA and TL for
ments and, parting from these, derives a semi-automatic implementation, thus image generation with very small training datasets. Therefore, the use of these
reducing the cost and error rate of AI-based projects. techniques can be extremely useful in areas of application where the availability
of data is limited, such as in the medical field. The paper has been sent to a
conference, and the complete results will be soon available.
Preliminary Results

When developing an AI-based project, one of the most crucial stages, and where
most of the time and effort is spent, is the preprocessing stage. Depending on the
preprocessing of the data, the performance of an AI model can drastically change
[4]. Therefore, the first part of the thesis will be focused on the preprocessing
part of the methodology, specifically, on the use of Feature Engineering (FE) tech-
niques for the diagnosis and prognosis of mental diseases via the recording of EEG
brain signals. The use of ML and DL models has become really popular in this field, Figure 1. Four samples synthesized by a GAN trained to generate images of glass façades.
however, it is essential to apply previous preprocessing FE techniques to the data,
as EEG are noisy and non-stationary signals. In other words, a proper choice of
FE techniques could greatly improve the performance depending on the algorithm
and the mental disorder. Conclusions and future work
This motivated us to perform a Systematic Mapping Study (SMS), where more than
900 articles were covered, with the objective of showing a clear overview of which The work done up to now acted as a preliminary step which allowed us to gain
FE and AI techniques have been applied to each mental disorder. This paper is in knowledge related to different techniques which are essential when developing
the revision stage and will be available in the future, but partial results are shown in AI-based projects. Therefore, the next step of the thesis will be to apply the gained
Fig. 2 via a bubble plot. This type of plots provide a clear overview of the amount knowledge into implementing the actual methodology, specifically, the part related
of work that has been performed regarding a combination of mental disorder, and with the processing of data before feeding it to the AI model.
feature transformation techniques, for example.

Figure 2. Bubble plot which shows the number of works related to each combination of feature transformation and mental disease. Note that the plot is cropped in order to fit in the poster.

References
[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
[2] J Hurwitz, Alan Nugent, Fern Halper, and Marcia Kaufman. Big data. New York, 2013.
[3] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020.
[4] Andreas Vogelsang and Markus Borg. Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pages 245–251. IEEE, 2019.
[5] Qingchen Zhang, Laurence T Yang, Zhikui Chen, and Peng Li. A survey on deep learning for big data. Information Fusion, 42:146–157, 2018.
[6] F Zhuang, Z Qi, K Duan, D Xi, Y Zhu, H Zhu, H Xiong, and Q He. A comprehensive survey on transfer learning. arxiv. arXiv preprint arXiv:1911.02685, 2020.

Acknowledgements

This work has been co-funded by the AETHER-UA project (PID2020-112540RB-C43), a smart data holistic approach for context-aware data analytics: smarter machine
learning for business modelling and analytics, funded by Spanish Ministry of Science and Innovation. And the BALLADEER (PROMETEO/2021/088) project, a Big Data
analytical platform for the diagnosis and treatment of Attention Deficit Hyperactivity Disorder (ADHD) featuring extended reality, funded by the Conselleria de Innovación,
Universidades, Ciencia y Sociedad Digital (Generalitat Valenciana).

Jornada de Doctorado en Informática (JDI) 2022 [email protected]

You might also like