0% found this document useful (0 votes)

64 views

Fuzzy Based Techniques For Handling Missing Values

This paper proposes two new techniques for imputing missing values in time series data using fuzzy membership functions. The techniques calculate weights for nearest neighbor data values using either a Gaussian or triangular fuzzy membership function. A weighted mean is then used to impute the missing values. The accuracy of the techniques is evaluated by comparing them to traditional imputation methods using neural networks, naive Bayes and decision tree classifiers. Evaluation results show the proposed techniques have higher accuracy than traditional methods, and the triangular membership function performs better than the Gaussian function.

Uploaded by

Farid Ali Mousa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Fuzzy Based Techniques For Handling Missing Values

Uploaded by

Farid Ali Mousa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 3, 2021

Fuzzy based Techniques for Handling Missing Values

Malak El-Bakry1, Ayman El-Kilany3, Sherif Mazen4 Farid Ali2
Information Systems Department Information Technology Department
Faculty of Computers and Information Faculty of Computers and Information
Cairo University, Cairo, Egypt Beni-suef University, Egypt, Cairo, Egypt

Abstract—Usually, time series data suffers from high Random (NMAR) data where the missingness probability is
percentage of missing values which is related to its nature and its not random and it depends on the variable itself and can’t be
collection process. This paper proposes a data imputation predicted from another variable in the dataset. [3].
technique for imputing the missing values in time series data.
The Fuzzy Gaussian membership function and the Fuzzy Missing data occurs in many types of the data sets but in
Triangular membership function are proposed in a data specific it occurs with a very high percentage in the time
imputation algorithm in order to identify the best imputation for series data. Time series data is a type of data that usually have
the missing values where the membership functions were used to incompleteness given to its nature. Time series data exist in
calculate weights for the data values of the nearest neighbor’s nearly every scientific field, where data are measured,
before using them during imputation process. The evaluation recorded, and monitored over time. Consequently, it is
results show that the proposed technique outperforms traditional understandable that missing values may occur. Also, most of
data imputation techniques where the triangular fuzzy the time series data are collected by sensors and machines
membership function has shown higher accuracy than the which is another reason for the occurrence of the missing
gaussian membership function during evaluation. values. [4].
Keywords—Time series data; fuzzy logic; membership This paper aims to ensure the data quality of time series
functions; machine learning; missing values data. More specifically, it aims to ensure the completeness
dimensions of the time series data that suffers from missing
I. INTRODUCTION value. Towards this aim, two novel techniques for imputing
In computer science field, the data quality problem began the missing values in time series data are proposed and
to rise in the 1990s with arise of the data warehouse systems compared with traditional techniques. The two proposed
where the failure of a database project was returned to its poor techniques impute the missing value by calculating the k-
data quality. [1] There is a lot of definitions for the word “data nearest neighbour between the missing value and the other
quality” but as mentioned in [2] there is a well-known values. Then it calculates a weight for each value in the
definition used by a lot of researchers which is “fitness for nearest neighbours using fuzzy membership functions. Two
use”. Data quality can be mainly summarized in how the fuzzy membership functions are used which are: the gaussian
system fits into the reality, or how users really utilize the data membership function and the triangular membership function.
in the system. [2]. After calculating the weights, the data values and their weights
are used in the weighted mean function to calculate the
Data quality can be assessed in terms of data quality imputed value. The accuracy of the proposed techniques is
dimensions. These data quality dimensions consist of evaluated by using three traditional classifiers: Neural
timelines to ensure that the value is new, consistency to ensure Network, Naïve Bayes, and Decision Tree. Evaluation Results
that representation of the data is unchanging in all cases, shows that the two proposed techniques have higher accuracy
completeness to ensure that the data is completed with no than the traditional data imputing techniques. In addition, it
missing values, and accuracy to ensure that the recorded value also shows that the triangular membership function yields
is identical with the actual value. [1]. higher accuracy rather than the gaussian membership function.
Incompleteness of data is a natural phenomenon as the The rest of this paper is organized as follows: Section 2
data is usually generated, entered, or collected with missing presents the related work and some techniques used in
values. Missing data can be defined as the values that are not imputing the missing values. Section 3 and 4 includes the
stored for a variable in the observation of interest. There are summarization of the proposed techniques and the results.
three types of missingness of the data. First, the missing Finally, the paper is concluded in section 5.
completely at random (MCAR): the variable is missing
completely at random where the probability of missingness is II. RELATED WORK
the same for all missing variables. Second, the Missing at
A lot of methods with different techniques have been
random (MAR): Variable is missing at random where the
proposed in the literature to solve the missing data problem.
probability of missingness is depending only on an available
The management of missing data can be divided into three
information. This type can also be named as missing
categories; deletion and ignoring methods; imputations
conditionally which means missing with a condition; for an
methods and model-based methods. These categories will be
example if gender is male, they will leave questions related to
discussed below with more details.
women in the survey empty. Third, the Not Missing at

50 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021

A. Deletion and Ignoring Methods and minimization (EM) algorithm. They called this algorithm
Deletion/Ignorance of missing values is recognized as the a vector autoregressive imputation method (VAR-IM). Their
simplest way in handle missing values. Authors in [5] proposed system is applied on a real-world data set involving
proposed the traditional techniques for dealing with missing electrocardiogram (ECG) data. They used linear regression
data. The listwise deletion algorithm was proposed where an substitution and list wise detection as a traditional method to
entire record is excluded from the data set if any value is be compared with their proposed method VAR-IM. They
missing. The pairwise deletion method was also proposed concluded that the proposed method VAR-IM produced a
where the method computes the correlation between missing large improvement of the imputation tasks as compared to the
and complete data to pair the correlated values and it only traditional techniques. This technique has three limitations, the
delete the un-correlated values. Listwise deletion would result first one is it only deal with data that is missing completely at
in removing more data than the pairwise method. The random. The second limitation is the validity of the approach
drawback of this method is that it may be very risky in case of requires that the time series should be stationary. The third
the missingness is a large portion of the data as it may limitation is the percentage of missing data has significant
interrupt the results of the analysis. impact on most missing data analysis methods, the proposed
technique does not have the priority to be used if the
B. Imputation Methods percentage of missing data is quite low (say less 10%).
The imputation methods work by substituting each of the Despite these limitations, the proposed technique provides an
missing values by an estimate value. The hot and cold deck important alternative to existing methods for handling missing
imputation is one of the best methods used in missing data data in multivariate time series.
imputation. In [6], they used the cold deck imputation for In [10] ,the authors propose a genetic algorithm (GA)
variables where it uses external sources such as a value from a based technique to estimate the missing values in datasets. GA
previous survey. It imputes missing values called as recipients is introduced to generate optimal sets of missing values and
using similar reported values from previous survey. Cold deck information gain (IG) is used as the fitness function to
imputation was performed through probabilistic record linkage measure the performance of an individual solution. Their goal
techniques in order to find the best matching records from is to impute missing values in a dataset for better classification
different data sources containing the same set of entities. results. This technique works even better when there is a
Another imputation technique was proposed by authors in higher rate of missing values or incomplete information along
[7] to generate an estimate value for the missing values. In [7], with a greater number of distinct values in attributes/features
the authors proposed a technique that considers multiple having missing values. They compared their proposed
imputations for imputing missing values. This technique technique with single imputation techniques and multiple
works by imputing missing values n-times to correspond to imputations (MI) statistically based approaches on various
the uncertainty of all the possible values that can be imputed. benchmark classification techniques on different performance
Then the values are analyzed in order to get a combined single measures. They show that the proposed methods outperform
estimate. As an example, you can choose two different when compared with another state-of-the-art missing data
techniques and use them together so you can take advantages imputation techniques.
of both techniques and avoid the disadvantages of these In [11], the authors used the gene expression data that are
techniques. recognized as a common data source which contains missing
C. Model-Based Methods expression values. In this paper, they present a genetic
algorithm optimized k- Nearest neighbour algorithm
The model-based methods are the methods which imputes
(Evolutionary KNN Imputation) for missing data imputation.
the missing values by using a predictive technique. These
They focused on local approach where the proposed
methods are mainly machine learning techniques that needs
Evolutionary k- Nearest Neighbour Imputation Algorithm falls
learning phase to be able to estimate missing values.
in. The Evolutionary k- Nearest Neighbour Imputation
In [8], the authors work on the weather data for Algorithm is an extension of the common k- nearest
environmental factors and found out that this data set contains Neighbour Imputation Algorithm which the genetic algorithm
a lot of missing values. They calculated the percentage of is used to optimize some parameters of k- Nearest Neighbour
missingness in the data to found out that 19% of the weather Algorithm. The selection of similarity matrix and the selection
data for 2017 are missed. This percentage is big in these types of the parameter value k can be identified as the optimization
of data and can cause misleading during the analysis that will problem. They compared the proposed Evolutionary k-
be done on it. Four missing data imputation was applied on Nearest Neighbour Imputation algorithm with k- Nearest
this data set. They divided the data sets into training and Neighbour Imputation algorithm and mean imputation
testing to measure the quality of the four imputation algorithms. method. Results show that Evolutionary KNN Imputation
The k-nearest neighbor (KNN) method results were the best outperforms KNN Imputation and mean imputation while
results, and its results was so close to the original data with no showing the importance of using a supervised learning
missing values and the prediction model’s performance is algorithm in missing data estimation. Even though mean
stable even when the missing data rate increases. imputation happened to show low mean error for a very few
missing rates, supervised learning algorithms became effective
In [9], authors implemented a new approach that is based when it comes to higher missing rates in datasets which is the
on vector autoregressive (VAR) model by combining most common situation among datasets.
prediction error minimization (PEM) method with expectation

51 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021

III. PROPOSED TECHNIQUE Two weighted functions are used to get the weight of each
In this paper, two techniques are proposed for imputing one of the nearest neighbors’ data points for a certain missing
missing values in time series data. The two proposed feature before using them to impute the missing value. The
techniques start by finding the K nearest neighbor data points triangular and the gaussian membership functions. The
for each data point containing a missing value for a certain triangular membership weight function works by calculating
feature. Then, the values of the missing feature in the nearest the minimum, the maximum and the average of the nearest
neighbor’s data points are weighted using one of the two fuzzy neighbors’ values of the missing feature. Then, it calculates
membership functions: triangular fuzzy membership function the weight for each value by using the triangular fuzzy
and gaussian membership function. The missing feature value membership function. Finally, the values and their weights are
is then obtained using the weighted mean of the feature in used in the weighted mean function to get the value of the
nearest neighbors. Fig.1 Show the steps of the proposed missing data. Algorithm 1 show the exact details of Triangular
technique. fuzzy membership function.

Algorithm 1: Triangular fuzzy membership Function

1: Function Triangular fuzzy membership weights (Nearest

Neighbors Values for the missing features)
Input: Nearest Neighbors Values for the missing features
Output: Missing feature value
2: Minimum= Minimum value of (Nearest Neighbors
Values for the missing features)
3: Maximum= Maximum value of (Nearest Neighbors
Values for the missing features)
4: Mean= Mean value of (Nearest Neighbors Values for the
missing features)
5: Get weight for each Nearest Neighbors Values for the
missing features using Triangular fuzzy membership
function
Triangular function is defined by a minimum value a, a
maximum value b, and a mean value m, where a < m < b.

0𝑥 ≤ 𝑎
⎧𝑥 − 𝑎
⎪ 𝑎<𝑥≤𝑚
𝑚−𝑎
𝜇(𝑥) = 𝑏 − 𝑥
⎨ 𝑚<𝑥<𝑏
⎪𝑏 − 𝑚
⎩ 0𝑥 ≥𝑏

6: Missing feature value = Calculate weighted mean using

Nearest Neighbors Values for the missing features and
weights for each one

∑𝑎𝜖𝐴 Nearest Neighbours Values(𝑎)𝑇𝑟𝑖𝑎𝑛𝑔𝑢𝑙𝑎𝑟 𝑤𝑒𝑖𝑔ℎ𝑡 (𝑎)

∑𝑎𝜖𝐴 𝑇𝑟𝑖𝑎𝑛𝑔𝑢𝑙𝑎𝑟 𝑤𝑒𝑖𝑔ℎ𝑡 (𝑎)
7: End

The Gaussian membership weigh function works by

calculating the mean, and the standard deviation of the nearest
neighbors’ values of the missing feature. Then, it calculates
the weight for value by using the Gaussian fuzzy membership
function. Finally, the values and their weights are used in the
Fig. 1. Proposed Technique Block Diagram. weighted mean function to get the value of the missing data.
Algorithm 2 show the exact details of Gaussian fuzzy
membership function.

52 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021

TABLE I. DATA SETS

Algorithm 2: Gaussian fuzzy membership function
Percentage
Data No of No of No of
Name of
1: Function Gaussian fuzzy membership weights (Nearest Set Samples Attributes class
missingness
Neighbors Values for the missing features)
Input: Nearest Neighbors Values for the missing features Data Ozone Level
2536 73 2 1.28%
Output: Missing feature value set 1 Detection [13]

Data for
2: Standard Deviation = Standard deviation of (Nearest Software
Neighbors Values for the missing features) Data
Engineering
3: Mean= Mean value of (Nearest Neighbors Values for the Teamwork 74 102 2 15.9%
set 2
Assessment in
missing features) Education
4: Get weight for each data value using Gaussian fuzzy Setting [13]
membership function Hybrid Indoor
−(𝑥−𝜇)2 Positioning
𝑓(𝑥) = 𝑒 2𝜎2
Dataset from
Data
WiFi RSSI, 1540 65 2 27.3%
set 3
5: Missing feature value = Calculate weighted mean using Bluetooth and
magnetometer
data value and weights for each one Data Set [13]
Data India COVID-
∑𝑎𝜖𝐴 Nearest Neighbours Values (𝑎)𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑤𝑒𝑖𝑔ℎ𝑡 (𝑎) 4838 7 70 2%
set 4 19 data [14]
∑𝑎𝜖𝐴 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑤𝑒𝑖𝑔ℎ𝑡 (𝑎)
Data Us COVID-19
6: End set 5 data [14]
8500 6 31 1.60%

Data
IV. PERFORMANCE EVALUATION AND DISCUSSION set 6
HPI master [14] 4236 8 3 37.1%

The objective of performance evaluation is to prove the

effectiveness of the proposed technique against standard As shown in Fig [2] to Fig [7], the two proposed
imputation techniques. Towards this objective, the proposed techniques using fuzzy algorithms [17][18] gives higher
techniques were evaluated on six datasets with different accuracy than the traditional techniques. Fuzzy logic performs
percentages of missing values. The proposed methods were better than the non-fuzzy since fuzzy logic has the advantage
evaluated against traditional imputation techniques. All of being grey not black nor white. As fuzzy logic uses
algorithms were evaluated using accuracy measure after membership functions, it can answer the uncertainties
considering a classification scenario on the data after generated from non-fuzzy logics where you must choose
imputation to find out the quality of the imputed data where between two options. Membership functions gives each value
three different classifiers were used. [12]. a membership value in each class rather than a binary decision
“belongs to or not belong to”. Fuzzy logic has multiple
Six different Time series data sets were used in this paper. membership functions (Gaussian, triangular, Trapezium,
The datasets are chosen with missing values due to machine …etc). Membership functions are equally good in
malfunctions, and simple human errors. The data available to performance but usually Gaussian and triangular MFs are
build time series models are often characterized by missing found to be closely performing well and better than other
values, due to various causes such as sensor faults, problems types of membership functions. The choice of which of the
of not reacting experiments, not recovering work situations, functions to use depends entirely on the size, problem type
transferring data to digital systems. Table 1 shows the details and data distribution.
of each dataset.
The evaluation results show that the proposed triangular
Each data set was divided into training and testing sets. weighted mean technique performs better in terms of accuracy
The training set are 75% of the whole dataset while the than that of the proposed gaussian weighted mean technique.
remaining 25% is considered as the testing set. Accuracy is Triangular MF has many advantages over the gaussian MF as;
used as an evaluation metric where the accuracy is obtained simple to implement, more convenient, response quickly, and
after using three well-known classification methods on fast computation [19]. Also, the datasets used in this paper
Different 6 data sets. The classifiers are the Decision tree, were found not to be normally distributed where triangular
Naive Bayes and artificial neural network classifiers. The MF usually works better with these types of data. The
artificial neural networks architecture is; 3 hidden layers and normality of the six datasets were tested using Kolmogorov-
200 epochs. Four imputation methods are used, the two Smirnov test [20] and it is found that they are not following
proposed methods (Gaussian weighted mean and Triangular the normal distribution. The Kolmogorov-Smirnov test, which
weighted mean) and two traditional methods (Average and is also known as KS test returns a test decision for the null
weighted mean) [15, 16]. The accuracy between the 4 methods hypothesis that the data in vector x comes from a standard
is computed. Results of the proposed and traditional normal distribution, against the alternative that it does not
techniques over the 6 datasets are summarized in the figures come from such a distribution. In addition, it was found that
respectively. the Triangular MF gives higher weights to the values near to

53 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021

the mean value and gives less weights to the values far from
the mean until it reaches zero weight at the farthest two values Hybrid Indoor Positioning
Dataset from WiFi RSSI,
from the mean. This would result higher weights to more
representative values and consequently better imputations for
the missing values. Bluetooth and magnetometer
Ozone Level Detection Dataset
DataSet
100 99
95 95
90 91
85
80 87
75 83
70 79
65 75
60

Gaussian WM

Triangularweighted

Average
KNN/ WM
Gaussian WM

KNN/ WM
TriangularweightedMe

Average

Mean
an

Propsed Techniques Traditional Techniques

Neural Network Naïve Bayes DT Neural Network Naïve Bayes DT

Fig. 2. Results for Ozone Detection Dataset. Fig. 4. Results for Hybrid Indoor Positioning Dataset from WiFi RSSI,
Bluetooth and Magnetometer Dataset.

Data for Software Engineering India COVID-19 Dataset

Teamwork Assessment in
100
Education Setting Dataset 95
90
85
80 80
70 75
60 70
50
Gaussian WM

Average
Triangularweig

KNN/ WM
40
htedMean

30
Gaussian WM

Triangularweigh

Average
KNN/ WM
tedMean

Propsed Techniques Traditional Techniques

Neural Network Naïve Bayes DT

Propsed Techniques Traditional Techniques
Fig. 5. Results for India COVID-19 Dataset.
Neural Network Naïve Bayes DT

Fig. 3. Results for Data for Software Engineering Teamwork Assessment in

Education Setting Dataset.

54 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021

gives a membership value for each point. Also, the results of

Us COVID-19 Dataset the two proposed techniques were compared to find out that
the triangular fuzzy membership function has higher accuracy
100 than the gaussian membership function. Many different tests,
96 and experiments have been left for the future due to lack of
92 time. Future work concerns deeper analysis for the data, new
88
84
proposals to try different methods. We also can start
80 forecasting the new data in the future as we now have a
complete data set.
Gaussian WM

Average
Triangularweight

KNN/ WM
REFERENCES
edMean
[1] Kumar, p.v., p. Scholar, and m.v. gopalachari, a review on prediction of
missing data in multivariable time series.
[2] Pratama, I., et al. A review of missing values handling methods on time-
series data. in 2016 International Conference on Information
Technology Systems and Innovation (ICITSI). 2016. IEEE.
Propsed Techniques Traditional Techniques [3] Tong, G., F. Li, and A.S. Allen, Missing data. Principles and practice of
clinical trials, 2020: p. 1-21.
Neural Network Naïve Bayes DT
[4] Rantou, K., Missing Data in Time Series and Imputation Methods.
University of the Aegean, Samos, 2017.
Fig. 6. Results for Us COVID-19 Dataset. [5] Williams, R., Missing data part 1: Overview, traditional methods.
University of Notre Dame, 2015: p. 1-11.
[6] Jayamanne, I.T., Cold Deck Imputation for Survey Non-response
HPI master Dataset Through Record Linkage, in International Statistical Conference 2017
IASSL. 2017.
100 [7] Rubin, D.B., Multiple imputation after 18+ years. Journal of the
95 American statistical Association, 1996. 91(434): p. 473-489.
90
85 [8] Kim, T., W. Ko, and J. Kim, Analysis and impact evaluation of missing
80 data imputation in day-ahead PV generation forecasting. Applied
75 Sciences, 2019. 9(1): p. 204.
Gaussian WM

Average
Triangularweig

KNN/ WM

[9] Bashir, F. and H.-L. Wei, Handling missing data in multivariate time
htedMean

series using a vector autoregressive model-imputation (VAR-IM)

algorithm. Neurocomputing, 2018. 276: p. 23-30.
[10] Shahzad, W., Q. Rehman, and E. Ahmed, Missing data imputation using
genetic algorithm for supervised learning. International Journal of
Advanced Computer Science and Applications, 2017. 8(3): p. 438-445.
Propsed Techniques Traditional Techniques [11] De Silva, H.M. and A.S. Perera, Evolutionary k-nearest neighbor
imputation algorithm for gene expression data. ICTer, 2017. 10(1).
Neural Network Naïve Bayes DT [12] Flach, P. Performance evaluation in machine learning: The good, the
bad, the ugly, and the way forward. in Proceedings of the AAAI
Conference on Artificial Intelligence. 2019.
Fig. 7. Results for HPI Master Dataset.
[13] https://archive.ics.uci.edu/ml/datasets.php.
V. CONCLUSION [14] https://www.kaggle.com/.
[15] Meng, Z., Ground Ozone Level Prediction Using Machine Learning.
The paper introduced two proposed techniques based on Journal of Software Engineering and Applications, 2019. 12(10): p. 423-
the fuzzy logic while imputing missing values in time series 431.
data. The first proposed technique is the Gaussian weighted [16] Petkovic, D., et al. Using the random forest classifier to assess and
mean technique. This technique uses the KNN first to find the predict student learning of software engineering teamwork. in 2016
nearest neighbours then it gives to each neighbour a weight IEEE Frontiers in Education Conference (FIE). 2016. IEEE.
using the gaussian membership function, these weights is sent [17] Kreinovich, Vladik, Olga Kosheleva, and Shahnaz N. Shahbazova.
"Why triangular and trapezoid membership functions: A simple
to the weighted mean function to calculate the imputed value. explanation." Recent Developments in Fuzzy Logic and Fuzzy Sets.
The second proposed technique is the Triangular weighted Springer, Cham, 2020. 25-31.
mean technique. This technique uses the KNN first to find the [18] Radhakrishna, Vangipuram, et al. "Design and analysis of a novel
nearest neighbours then it gives to each neighbour a weight temporal dissimilarity measure using Gaussian membership
using the triangular membership function, these weights is function." 2017 international conference on engineering & MIS
sent to the weighted mean function to calculate the imputed (ICEMIS). IEEE, 2017.
value. The results of the two proposed techniques were [19] Sadollah, A., Introductory chapter: which membership function is
appropriate in fuzzy system?, in Fuzzy logic based in optimization
compared with other two traditional techniques. The results methods and control systems and its applications. 2018, IntechOpen.
output is that the two proposed techniques have higher
[20] Godina, R. and J.C. Matias. Improvement of the statistical process
accuracy than the traditional imputation techniques. Based on control through an enhanced test of normality. in 2018 7th International
the experiments conducted in this paper it can be concluded Conference on Industrial Technology and Management (ICITM). 2018.
that fuzzy membership functions can have better accuracy, IEEE.
and this is due to its behaviour in dealing with the data as it

55 | P a g e
www.ijacsa.thesai.org

MOS Study Guide - Excel 2019 - Exam MO 200
100% (4)
MOS Study Guide - Excel 2019 - Exam MO 200
167 pages
MIL Module 1 (Ien)
72% (18)
MIL Module 1 (Ien)
21 pages
m Akaba 2019
No ratings yet
m Akaba 2019
7 pages
DL vs Conventional
No ratings yet
DL vs Conventional
14 pages
Businnes Intelligence
No ratings yet
Businnes Intelligence
36 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
Emmanuel Et Al. - 2021 - A Survey on Missing Data in Machine Learning
No ratings yet
Emmanuel Et Al. - 2021 - A Survey on Missing Data in Machine Learning
37 pages
Emmanuel 2021 A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel 2021 A Survey On Missing Data in Machine Learning
37 pages
Missing Value Imputation using Hybrid K-Means and Association Rules
No ratings yet
Missing Value Imputation using Hybrid K-Means and Association Rules
9 pages
An analysis of four missing data treatment methods for supervised learning
No ratings yet
An analysis of four missing data treatment methods for supervised learning
16 pages
PHD seminar
No ratings yet
PHD seminar
38 pages
CBRG A Novel Algorithm For Handling Missing Data Using Bayesian Ridge Regression and Fea
No ratings yet
CBRG A Novel Algorithm For Handling Missing Data Using Bayesian Ridge Regression and Fea
17 pages
SSRN-id3349586
No ratings yet
SSRN-id3349586
7 pages
2 PB
No ratings yet
2 PB
10 pages
Under-Sampling Technique For Imbalanced Data Using Minimum Sum of Euclidean Distance in Principal Component Subset
No ratings yet
Under-Sampling Technique For Imbalanced Data Using Minimum Sum of Euclidean Distance in Principal Component Subset
14 pages
Missing Value Paper
No ratings yet
Missing Value Paper
10 pages
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
No ratings yet
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
13 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Robust Data Model For Enhanced Anomaly Detection: R.Ravinder Reddy, Dr.Y Ramadevi, DR.K.V.N Sunitha
No ratings yet
Robust Data Model For Enhanced Anomaly Detection: R.Ravinder Reddy, Dr.Y Ramadevi, DR.K.V.N Sunitha
8 pages
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
No ratings yet
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
7 pages
Missing Imput Values
No ratings yet
Missing Imput Values
2 pages
Preprocessing in Data Mining: Edgar Acu Na
No ratings yet
Preprocessing in Data Mining: Edgar Acu Na
5 pages
Feature Selection Based On Fuzzy Entropy
No ratings yet
Feature Selection Based On Fuzzy Entropy
5 pages
A Review of Missing Values Handling Methods On Time-Series Data
No ratings yet
A Review of Missing Values Handling Methods On Time-Series Data
7 pages
Anomaly Detection Via Eliminating Data Redundancy and Rectifying Data Error in Uncertain Data Streams
No ratings yet
Anomaly Detection Via Eliminating Data Redundancy and Rectifying Data Error in Uncertain Data Streams
18 pages
Graph Autoencoder-Based Unsupervised Feature Selection With Broad and Local Data Structure Preservation
No ratings yet
Graph Autoencoder-Based Unsupervised Feature Selection With Broad and Local Data Structure Preservation
28 pages
Improving network intrusion detection by identifying effective features based on probabilistic dependency trees and evolutionary algorithm
No ratings yet
Improving network intrusion detection by identifying effective features based on probabilistic dependency trees and evolutionary algorithm
13 pages
Missing Data Exploration in Air Quality Data
No ratings yet
Missing Data Exploration in Air Quality Data
9 pages
Feature Selection Techniques For Microarray Dataset: A Review
No ratings yet
Feature Selection Techniques For Microarray Dataset: A Review
8 pages
Ensemble Models For Effective Classification of Big Data With Data Imbalance
No ratings yet
Ensemble Models For Effective Classification of Big Data With Data Imbalance
17 pages
IEEE Conference Templa
No ratings yet
IEEE Conference Templa
4 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
Class Imbalance Problem in Data Mining: Review
No ratings yet
Class Imbalance Problem in Data Mining: Review
5 pages
Feature Subset Selection With Fast Algorithm Implementation
No ratings yet
Feature Subset Selection With Fast Algorithm Implementation
5 pages
Cienciadedatos
No ratings yet
Cienciadedatos
21 pages
A Novel Approach For Feature Selection Based On Correlation Measures CFS and Chi Square
No ratings yet
A Novel Approach For Feature Selection Based On Correlation Measures CFS and Chi Square
13 pages
Anomaly Detection Using Machine Learning
No ratings yet
Anomaly Detection Using Machine Learning
10 pages
Missing Data Imputation by K Nearest Neighbours Based On Grey Relational Structure and Mutual Information
No ratings yet
Missing Data Imputation by K Nearest Neighbours Based On Grey Relational Structure and Mutual Information
22 pages
Imbalanced_Data_Classification_Method_Based_on_LSSASMOTE
No ratings yet
Imbalanced_Data_Classification_Method_Based_on_LSSASMOTE
9 pages
Feature Selection Based On Class-Dependent Densities For High-Dimensional Binary Data
No ratings yet
Feature Selection Based On Class-Dependent Densities For High-Dimensional Binary Data
13 pages
Usr Tomcat7 Documents REZUMAT ENG Lemnaru Camelia
No ratings yet
Usr Tomcat7 Documents REZUMAT ENG Lemnaru Camelia
6 pages
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
No ratings yet
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
14 pages
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
No ratings yet
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
12 pages
Data Mining Machine Learning and Big Dat
No ratings yet
Data Mining Machine Learning and Big Dat
7 pages
Impact of Outlier Removal and Normalization Approa
No ratings yet
Impact of Outlier Removal and Normalization Approa
6 pages
A Method For Missing Values Imputation of Machine Learning Datasets
No ratings yet
A Method For Missing Values Imputation of Machine Learning Datasets
11 pages
Research Article: Network Intrusion Detection Method Based On Fcwgan and Bilstm
No ratings yet
Research Article: Network Intrusion Detection Method Based On Fcwgan and Bilstm
17 pages
Paper IJRITCC
No ratings yet
Paper IJRITCC
5 pages
Physics Informed Machine Learning For Data Anomaly Detection, Classification1
No ratings yet
Physics Informed Machine Learning For Data Anomaly Detection, Classification1
21 pages
missingValue
No ratings yet
missingValue
11 pages
Strategies and Algorithms For Clustering Large Datasets: A Review
No ratings yet
Strategies and Algorithms For Clustering Large Datasets: A Review
20 pages
palo2021
No ratings yet
palo2021
30 pages
Integrated ECOD-KNN Algorithm For Missing Values Imputation in Datasets: Outlier Removal
No ratings yet
Integrated ECOD-KNN Algorithm For Missing Values Imputation in Datasets: Outlier Removal
5 pages
Survey On Feature Selection in High-Dimensional Data Via Constraint, Relevance and Redundancy
No ratings yet
Survey On Feature Selection in High-Dimensional Data Via Constraint, Relevance and Redundancy
4 pages
A Review of Feature Selection and Its Methods
No ratings yet
A Review of Feature Selection and Its Methods
15 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
conference paper-ug project
No ratings yet
conference paper-ug project
7 pages
Weighted nearest neighbors and radius oversampling for imbalanced data classification
No ratings yet
Weighted nearest neighbors and radius oversampling for imbalanced data classification
12 pages
ML Unit 4 @ VS
No ratings yet
ML Unit 4 @ VS
33 pages
An Improving Genetic Programming Approach Based Deduplication Using KFINDMR
No ratings yet
An Improving Genetic Programming Approach Based Deduplication Using KFINDMR
8 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
3 Analysis-Streams
No ratings yet
3 Analysis-Streams
67 pages
Final Report Global Perspective
No ratings yet
Final Report Global Perspective
9 pages
3.Cloud Storage Models
No ratings yet
3.Cloud Storage Models
5 pages
Home Made Coaxial Dipole Antenna For Civil (And Military) Airband
100% (1)
Home Made Coaxial Dipole Antenna For Civil (And Military) Airband
53 pages
147 Etching Machine
No ratings yet
147 Etching Machine
1 page
Schem SPI External Editor Users Guide
No ratings yet
Schem SPI External Editor Users Guide
20 pages
Get (Ebook) Parallel Computing in Quantum Chemistry by Curtis L. Janssen, Ida M. B. Nielsen ISBN 9781420051643, 9781420051650, 1420051644, 1420051652 PDF ebook with Full Chapters Now
100% (2)
Get (Ebook) Parallel Computing in Quantum Chemistry by Curtis L. Janssen, Ida M. B. Nielsen ISBN 9781420051643, 9781420051650, 1420051644, 1420051652 PDF ebook with Full Chapters Now
54 pages
Vwap - Change
No ratings yet
Vwap - Change
18 pages
Unifi Service Acceptance Form: Section 1: Pre-Installation Checklist
No ratings yet
Unifi Service Acceptance Form: Section 1: Pre-Installation Checklist
8 pages
Information Technology: Beni-Suef National University
No ratings yet
Information Technology: Beni-Suef National University
117 pages
2013 WMI Competition Grade 2 Part 1 Logical Reasoning Test: Problems 1-30: 5 Points Each For A Total of 150 Points
100% (1)
2013 WMI Competition Grade 2 Part 1 Logical Reasoning Test: Problems 1-30: 5 Points Each For A Total of 150 Points
4 pages
Wazuh
No ratings yet
Wazuh
22 pages
EE564 HW1 Fall2019
No ratings yet
EE564 HW1 Fall2019
4 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
3 pages
Collision Detection
No ratings yet
Collision Detection
46 pages
10-Assigment-1 PQT
No ratings yet
10-Assigment-1 PQT
8 pages
Jetson TX2 GPIO Mapping
No ratings yet
Jetson TX2 GPIO Mapping
6 pages
Warrior X Turbo User Manual
No ratings yet
Warrior X Turbo User Manual
1 page
Ncm110nif Midterm Laboratory Notes
No ratings yet
Ncm110nif Midterm Laboratory Notes
12 pages
ROB521 Assignment 3 PDF
No ratings yet
ROB521 Assignment 3 PDF
7 pages
What Are Hard Skills
No ratings yet
What Are Hard Skills
5 pages
Drop Box
No ratings yet
Drop Box
9 pages
Question Bank-InfoSec
No ratings yet
Question Bank-InfoSec
1 page
Question - Fa2020 - Midterm Exam - Fa2020 - CSE470 Courseware - Bux - BRAC University
No ratings yet
Question - Fa2020 - Midterm Exam - Fa2020 - CSE470 Courseware - Bux - BRAC University
5 pages
Linux Os Project
No ratings yet
Linux Os Project
3 pages
Freightliner Business Class M2 Truck Fault Codes DTC
100% (2)
Freightliner Business Class M2 Truck Fault Codes DTC
163 pages
Unit 4 Components of Computer System
No ratings yet
Unit 4 Components of Computer System
38 pages
Chapter 4 Review
No ratings yet
Chapter 4 Review
7 pages

Fuzzy Based Techniques For Handling Missing Values

Uploaded by

Fuzzy Based Techniques For Handling Missing Values

Uploaded by

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 3, 2021

Fuzzy based Techniques for Handling Missing Values

Algorithm 1: Triangular fuzzy membership Function

1: Function Triangular fuzzy membership weights (Nearest

6: Missing feature value = Calculate weighted mean using

∑𝑎𝜖𝐴 Nearest Neighbours Values(𝑎)𝑇𝑟𝑖𝑎𝑛𝑔𝑢𝑙𝑎𝑟 𝑤𝑒𝑖𝑔ℎ𝑡 (𝑎)

The Gaussian membership weigh function works by

TABLE I. DATA SETS

The objective of performance evaluation is to prove the

Propsed Techniques Traditional Techniques

Neural Network Naïve Bayes DT Neural Network Naïve Bayes DT

Data for Software Engineering India COVID-19 Dataset

Propsed Techniques Traditional Techniques

Neural Network Naïve Bayes DT

Fig. 3. Results for Data for Software Engineering Teamwork Assessment in

gives a membership value for each point. Also, the results of

series using a vector autoregressive model-imputation (VAR-IM)

You might also like