A Review of Missing Values Handling Methods On Time-Series Data
A Review of Missing Values Handling Methods On Time-Series Data
net/publication/313867740
CITATIONS READS
19 9,160
4 authors, including:
Rini Indrayani
Universitas AMIKOM Yogyakarta
6 PUBLICATIONS 37 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Irfan Pratama on 05 April 2018.
Abstract — Missing values becomes one of the problems because of an observation devices breakdown. While in an
that frequently occur in the data observation or data opinion survey or an interview, missing values caused by
recording process. The needs of data completeness of the respondents who decline to answer or complete the survey,
observation data for the uses of advanced analysis becomes or even because insufficient survey observation’s stuff
important to be solved. Conventional method such as [2,3]. Both examples stated above are trully unexpected
mean and mode imputation, deletion, and other methods occurrence. In other word, the data are expected to be observed
are not good enough to handle missing values as those (censor failure cases), yet it is missing due to the setbacks.
method can caused bias to the data. Estimation or Missing values is probably can not be evaded as it is happened
imputation to the missing data with the values produced unexpectedly. Time series data are become one of some data
by some procedures or algorithms can be the best possible that most likely to have missing values in it.
solution to minimized the bias effect of the conventional Time series is a data that observed in regular interval of
method of the data. So that at last, the data will be times [4]. Time series form can found as a continous data or
completed and ready to use for another step of analysis or discrete because observed in a regular interval of time
data mining. In this paper, we will explain and describe (example: daily, monthly, annualy). Some examples of
several previous studies about missing values handling continous and discrete time series data are sinusoidal signal
methods or approach on time series data. This paper also which is continous time series data, while daily stock prices,
discuss some plausible option of methods to estimate temperature data are a discrete data.
missing values to be used by other researchers in this field Data measurements are conducted several times with
of study. The discussion’s aim is to help them to figure out different condition, and sometimes, missing data occured due
what method is commonly used now along with its to several problem that are known as the “missingness
advantages and drawbacks. mechanism” [5,6,7]:
1) Missing completely at random (MCAR): a variable
Keywords — missing values, estimation technique, mean is missing completely at random if the probability of
imputation, deletion, time series. missingness is the same for all units, or in other words the
is no dependencies of the missingness probability related to
I. INTRODUCTION
the variable itself [7,8,9]
A pack of data or dataset can be used to obtain a certain 2) Missing at random (MAR): a variable is missing
specific information which can give a new knowledge. It can at random if the probability of missingness is depending
be obtained as classification, trend, pattern, etc [1]. Observed only on available information.
data can be obtained by several methods, such as censor record 3) Not missing at random (NMAR): the missingness
(automaticly and continously) or by frequent observation probability is depending on the variable itself.
(survey, medical record, etc). Each of analysis process consists Diffferent way of data measurement will emerging
of several steps, one of them is preprocessing. Preprocessing is different mechanism assumption, for example: censor
a step or phase to identify, selection, or problem handling of recording failure will be treated as a missing completely at
the data. Missing values handling is included in the random since it has no dependencies to the data that are
preprocessing step. Missing values is the most common missing, but in a survey, a respondent who refused to report
problem that occured in data that is obtained by observation his income report will be treated as not missing at random
and censor recording. since the surveyors already expected that such certain data
Missing data occured because of several problems, such are unlikely to be obtained easily. Several methods has been
as technical fault or human errors (the object of observation introduced to solve missing values according to its missing
did not give sufficient data to the observer). Some real world mechanism or even a general solution for any missing values
cases such as industrial experiment, missing values occured mechanism from the conventional one to the more modern