A Study of Handling Missing Data Methods for Big Data

Uploaded by

reggiesoluoch

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

A Study of Handling Missing Data Methods for Big Data

Uploaded by

reggiesoluoch

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

A study of handling missing data methods for big

data
Imane Ezzine
Laila Benhlima
Ecole Mohammadia d’Ingénieurs Ecole Mohammadia d’Ingénieurs
Mohammed V University in Rabat Mohammed V University in Rabat
Rabat, Morocco Rabat, Morocco
[email protected] [email protected]

Abstract— Improving data quality is not a recent field but in necessary to conduct a survey to check the accuracy of the
the context of big data this is a challenging area as there is a data.
crucial need for data quality to, for example, increase the
accuracy of big data analytics or avoid storing redundant data. • Coherence: concerns linked data values in different data
Missing data is one of the major problem that faces the quality of instances or consistency with values taken from a known
data. There are several methods and approaches that have been reference data domain. This is a criterion that requires
used in relational databases to handle missing data most of checking that the data satisfies a set of constraints in order to
which have been adapted to big data. This paper aims to provide decide that it is consistent.
an overview of some methods and approaches for handling
missing data in big data contexts. • Uniqueness: Specifies that each real-world element is
represented once and only once in the dataset
Keywords— data quality, missing data, big data, functional • Compliance: express if the data complies with the
dependancy, master data, machine learning appropriate conventions and standards. For example, a value
may be correct, but follows the wrong format or recognized
I. INTRODUCTION standard.
In the past, data in small databases had quality issues but • Completeness: is related to the fact that the data exists,
since the size of the database was manageable, it was easy to that is, the value is not null. Incomplete data creates
clean the dataset and have accurate results after data uncertainties during data analysis and must be managed
processing [14]. Nowadays, with the emergence of big data, during this process.
data originate from many different sources; and not all this
To verify this last measure, Information completeness
sources are verified. So data scientists are often checking
concerns whether the data set has complete information to
data for missing values and then perform various operations answer queries or to provide efficient models for supervised
to fix the data or insert new values. Missing data is machine learning algorithms.
problematic as many statistical analysis require complete
data for a good analysis. Moreover, supervised machine To evaluate data completeness in different contexts, we
learning methods use the data for training their models. In should ask the following questions:
the context of massive data, finding the missing values are - For transactional systems: Given a data set D and a
more challenging. Many methods have been proposed to query Q, we want to know whether Q can be correctly
tackle this problem for big data but in our knowledge, there answered by using only the data in D.
is no existing review or overview of these methods. In this
paper, we aim at presenting a study to highlight some of - For ETL: Given a data set D and an ETL X, we want to
know the impact on accuracy if the Data Warehouse fact
these approaches.
table has missing values during the ETL process.
The rest of the paper is organized as follows. We
- In Machine learning model building: given a data set D
introduce in section 2 some data quality metrics and
and M a model deduced from machine learning algorithm,
especially the one related to missing data. Section 3 presents
we want to know if M can be a trustful predicted model by
three methods to handle missing data for big data, next in
using only the data in D.
section 4, a discussion about these methods is given, and
finally, we end with a conclusion and some future works. Several data quality techniques are proposed to clean
messy tuples from data sets and in particular researchers aim
II. DATA QUALITY METRICS to find critical information missing from data sets. In this
paper, we are going to highlight a few of the methods used to
Data quality can be defined in many different ways. In deal with missing data in the context of big data.
the most general sense, good data quality exists when data is
suitable to serve its purpose in a given context [22].
III. DATA QUALITY METHODS FOR MISSING DATA
There is no exact definition for data quality, but there are
Before you begin to format your paper, first write and
some popular measures that enable to express the quality of
save the content as a separate text file. Complete all content
data such as [1, 2]:
and organizational editing before formatting. Please note
• Accuracy: express if the data represent reality or a sections A-D below for more information on proofreading,
reliable source. It is a very expensive criterion because it is spelling and grammar.
necessary to have an external reference frame, otherwise it is

In the next, we present three classes of techniques: the is given below.
first one is based on Repository of data, the second one is
based on functional dependencies (FD) and the last one is Definition 1
based on machine learning algorithms. Let C be the schema of the Data Set (DS) such as C is the
set of columns, X and Y two subsets of columns such as X
A. Repository of data for missing data and Y are subsets of C.
A data repository or data dictionary (DD), as called in X functionally determines Y (noted X → Y) iff for all xi = xj
some researches [2], is a tool that include all possible values then yi = yj.
of some attributes. It is a master data for ensuring data
In other words for every value xi of X, there is only one
completeness.
corresponding value yj of Y.
The data dictionary can be seen as a set of triplets:
2) FD for missing data
CatDD (Category, Information, Subcategory (Languages).
The approaches based on FD for handling missing data
Figure 1 depicts a case of Data dcictionay (DD) where :
are various [2], [6], [11], and [17]. They consist of
DD = {CatDDi / CatDDi (Cati, Infoij, SubCatik) i = 1; n, j = identifying the FDs, analyzing them and preserving the
1; p, k = 1; 6} where probable ones before applying them to complete the data
according to the FDs. In [6], authors propose to extract the
- n is the number of categories, FD using some algorithms (like FUN and CFun for
- p the number of values satisfying one category and conditional functional dependencies) [23], and extract all
six sub-categories. verified FDs on a correct reference table, then unsuitable
records are checked with this table and corrected.
The process consists of recreating the data set by defining
the categories and subcategories of its columns, defining the
C. Machine Learning based methods
functional dependencies between the new columns and then
finding the missing information that corresponds in the data In many researches with data mining applications, there
dictionary. are different methods and approaches to clean missing data
by deductive learning from huge data. Some of them uses
techniques such regression, Bayesian formalism, decision
TABLE I. EXAMPLE OF A DATA DICTIONARY
trees or clustering algorithms and others based on data
imputation such KNN or Random forest [7], [8], [10], [13],
idCat Category Information Subcategory(languag [15], [16],[18],[21] and [23]. In the next, we focus on 2
e) works: the first is based on random forest and the second one
CatDD1 Continent Info11=Europe SubCat11=English on KNN.
Info12=Europe SubCat12=Frensh
CatDD2 Country Info21=Europe SubCat21=English 1) Random Forest
Info22=Europe SubCat22=Frensh Random Forest (RF) is a group of individual
CatDD3 City Paris English classification tree predictors (Breiman 2001). For each
London English
Beijing English observation, each individual tree votes for one class and the
Paris Frensh forest predicts the class that has the highest rate of votes [7].
Londres Frensh
Pekin Frensh The RF algorithm can handle missing values by
CatDD4 First Name Adam weighting the frequency of the observed values in a variable
Rahma with the RF proximities, which is by the way an important
France
Marie
source of information [20], after being trained on the
Paris initially mean imputed dataset [19]. However, this approach
Aicha requires a complete response variable for training the forest.
CatDD5 Civility Miss English Instead, the missing values are directly predicted using an
Mister English
RF trained on the observed parts of the dataset [13] by
Madame Frensh
Monsieur Frensh applying a multidimensional scaling [20].
For ensuring data completeness, authors in [19]
B. Functional Dependencies based methods proposed a method based on random forests called
Functional dependencies (FDs) are deduced from missForest. This method requires a first naive imputation
management rules that describe relationships among (imputation means the process which enable to determine
columns. Pairs of columns or column-sets must then be and assign values for missing data items [25]), by default a
analyzed. In the next, we provide a brief recall of functional completion by the average, in order to obtain a learning
dependencies definition before summarizing FD based sample full. Then a series of random forests are adjusted to
methods for missing data. the first degradation of the model.

1) Functional Dependencies (FD) 2) KNN method

Functional Dependencies (FDs) have been recently The principle of the kNN algorithm is as follows [18,22] :
introduced in the context of data cleaning and specially for given a text to classify, the algorithm looks for the k nearest
solving missing data problem. The formal definition of a FD neighbors among the documents used during the learning
phase, the categories of these k nearest neighbors are used to

499
give weight to the categories classification candidates. It is databases where data has structured formats. It is not always
the degree of similarity between the test document and the the case in big data context where data are semi-structured
neighboring document that is used as the weight of the and unstructured (eg text documents).
category of the latter. If several neighbors share the same
In the case of machine learning based methods for
category then the weight assigned to this category is equal to
missing data, a model is built to predict the missing value.
the sum of the degrees of similarity between the test
Nevertheless, building a good model depends on selecting
document and each of the neighbors belonging to this
the right attributes to avoid correlated data and hence to
category. By this method we can obtain a list of weights
avoid producing biased models. Feature selection is difficult
assigned to each category. The test document is classified in
in the context of big data when dealing with hundreds of
a category if the weight allocated to it is greater than a
attributes.
threshold set in advance.
On the other hand, the use of data imputation as is not
k neighbors are selected based on a distance measure and
always appropriate for two reasons: First, the imputed values
their average is used as an imputation estimate. The method
are predicted and it is only a mean to approach the real
requires the selection of the number of nearest neighbors and
values. Second, they introduce uncertainty to the model,
a distance metric. KNN can predict both discrete attributes
which should be taking into account when estimating the
(the most frequent value among the k nearest neighbors) and
variance [4]
continuous attributes (the mean among the k nearest
neighbors) In kNN method, if the missing rate is higher than 70%,
some tests with different k values higher than 1 showed that
x Chose K spots that are most similar to the spot with
there isn’t so much difference between results from each
the missing value. In order to estimate the missing
other and the results for k = 1 were a little bit better than the
value xij of i th spot in j th sample, K spots are
other results from other values [16].
selected whose expression vectors are similar to the
expression of i in samples other than j.
V. CONCLUSION
x Measure the distance between two expression
Data quality issues include the presence of noise, outliers,
vectors xi and xj by using the Euclidian distance
missing or duplicate data. When improving the data quality,
over the observed components in j th sample.
typically the quality of the resulting analysis is also
x Estimate the missing value as an average of the K improved. In this study, we have presented an analysis of
nearest neighbors. three types of approaches for handling missing data. While
FDs and DDs methods give limited results in the context of
Authors in [16] have used Knn to handle the missing data big data, Machine learning methods are more efficient but to
by calculating the distance metric which varies according to have a good quality predicted model, it needs additional data
the type of data [16]: preprocessing such as feature selection. Data quality need
- If the missing value in the target example is symbolic grows more and more in this new era of big data. We aim at
which means set to be 0 if xi is equal to Yi, and 1 if xi is not finding new algorithms for improving data quality on big
equal to Yi., the method uses the mode of the corresponding data and find new ways to assess data quality more
attribute values in the k examples to replace the missing accurately.
value.
REFERENCES
- If the missing value in the target example is continuous, the
method uses the mean of the corresponding attribute values [1] Suraj Juddoo ,Overview of data quality challenges in the context of
in the k examples to replace the missing value. BigData, IEEE ,2015
[2] Aïcha BEN SALEM.. Qualité contextuelle des données : Détection et
nettoyage guidés par la sémantique des données. Ph.D. thesis at Paris
IV. DISCUSSION 13 Sorbonne University, 2015
All the methods presented enable to handle missing data. [3] Fei Tang, Hemant Ishwaran. , Random forest missing data algorithms,
They have advantages and drawbacks we discuss in what University of Miami,Statistical Analysis and Data Mining,, June 2017
follows. [4] Nikolas MittagHarris Imputations: Benefits, Risks and a Method for
Missing Data, Harris School Of Public Policy, University of Chicago,
In the case of automatic extraction of functional May 2013
dependencies, we don’t have always a good accuracy of the [5] Fei Tang and Hemant Ishwaran Random Forest Missing Data
Algorithms Division of Biostatistics, University of Miami, January
inserted values since we can deduce non meaningful FDs. 2017
The approaches based on data dictionary assume to have [6] Houda Zaidi: Amélioration de la qualité des données : correction
sémantique des anomalies inter- colonnes, National Conservatory of
all the possible values for some attributes in that dictionary. Arts and Crafts - CNAM, Nov 2017
This is impossible for some attributes in the context of big [7] Rehanullah Khan1, Allan Hanbury2, Julian Stoettinger1,
data. Another issue of this solution is that the data dictionary DETECTION: A RANDOM FOREST APPROACH Proceedings of
has to be carefully filled out. Otherwise we will end up with 2010 IEEE 17th International Conference on Image Processing, Hong
bad results. The enrichment process of the dictionary can be KongSKIN, pp26-29, 2010
itself tedious. [8] Streams Arinto Murdopo Distributed Decision Tree Learning for
Mining Big Data, Master of Science Thesis European Master in
FDs and DDs based methods give good results to the Distributed Computing, July 2013
problem of missing value when applied to relational

500
[9] Wenfei Fan 1,2, Floris Geerts 1, Laks V.S. Lakshmanan 3, Ming [19] Stekhoven D.J. et Bühlmann P., MissForest - nonparametric missing
Xiong , Discovering Conditional Functional Dependencies, IEEE value imputation for mixed-type data, Bioinformatics Advance
International Conference on Data Engineering, pp:1231-1234, 2009 Access , Oxford University Press, Vol. 28 no. 1 pages 112–118, 2012,
[10] Liang Duan Kun Yue Wenhua Qian Weiyi Liu: Cleaning Missing [20] A. Verikas, A. Gelzinis, M. Bacauskien, Mining data with random
Data Based on the Bayesian Network,Springer International forests: A survey and results of new tests ,Elsevier,pp:330-349 ,
Conference on Web-AgeInformation Management WAIM Web-Age August 2010
Information Management ,pp 348-359 , 2013 [21] Bennane Abderrazak, , «TRAITEMENT DES VALEURS
[11] Yong-Nan Liu,Jian-Zhong LiZhao-Nian Zou ,Determining the Real MANQUANTES POUR L’APPLICATION DE L’ANALYSE
Data Completeness of a Relational Dataset, Springer ,july 2016 LOGIQUE DES DONNEES À LA MAINTENANCE
[12] ChuangMa,Hao HelenZhang, XiangfengWang, Machine learning for CONDITIONNELLE”, Master's thesis, Polytechnic school of
Big Data analytics in plants, Cell press, ,Decembre 2014 Montréal, Septembre 2010
[13] TANG FEI, RANDOM FOREST MISSING DATA APPROACHES, [22] Anne Marie Smith , FOUNDATIONS OF DATA QUALITY
University of Miami,, Mai 2017 MANAGEMENT, Morgan & Claypool ,August 2012,
[14] Neha Mathur, Rajesh Purohit, Issues and Challenges in Convergence [23] Piotr S. Gromski ,Yun Xu ,Helen L. Kotze ,Elon Correa ,David I.
of Big Data, Cloud and Data Science, Febrary 2017, Ellis ,Emily Grace Armitage ,Michael L. Turner and Royston
Goodacre, Influence of Missing Values Substitutes on Multivariate
[15] Glenn De'ath ,Katharina E. Fabricius, , Classification and regression Analysis of Metabolomics Data, metabolites Open Access Journal,
trees: a powerful yet simple technique for ecological data analysis, juin 2014
ESA, Novembre 2012
[24] Eve Garnaud, Dépendances fonctionnelles: extraction et exploitation
[16] Ying Zou ; Aijun An ; Xiangji Huang,Evaluation and automatic University Science and Technology- Bordeaux I, Novembre 2013,
selection of methods for handling missing data, IEEE,pp:723-733
,Decembre 2005 [25] JM Brick and G Kalton, Handling missing data in survey
research,Statistical Methods in Medical School, Volume: 5 issue:
[17] Erhard Rahm, Hong Hai Do, Data Cleaning: Problems and Current 3,pp215-238, 1996,
Approaches, University of Leipzig, Germany,IEEE, Decembre 2010
[18] WikiStat, Imputation de données manquantes, Toulouse Math
University, July 2018

501

Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
Data Quality: The Other Face of Big Data: Barna Saha Divesh Srivastava
No ratings yet
Data Quality: The Other Face of Big Data: Barna Saha Divesh Srivastava
4 pages
A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing
No ratings yet
A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing
10 pages
1.1 Module-1
No ratings yet
1.1 Module-1
31 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
DS Unit 1 Essay Answers.
No ratings yet
DS Unit 1 Essay Answers.
18 pages
CS 3353 FDS Unit 1 Notes Jpr
No ratings yet
CS 3353 FDS Unit 1 Notes Jpr
39 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Big Data Dimensional Analysis
No ratings yet
Big Data Dimensional Analysis
6 pages
Ensuring Data Quality
No ratings yet
Ensuring Data Quality
16 pages
UNIT-1
No ratings yet
UNIT-1
30 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
21 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages
1 s2.0 S2666285X22000565 Main
No ratings yet
1 s2.0 S2666285X22000565 Main
9 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
56 pages
Module 1
No ratings yet
Module 1
35 pages
Bda Ut1 Que Ans
No ratings yet
Bda Ut1 Que Ans
13 pages
paper-solution-BDA
No ratings yet
paper-solution-BDA
15 pages
CH-2 Data Science Emerging Technology
No ratings yet
CH-2 Data Science Emerging Technology
20 pages
Question Bank DMC
No ratings yet
Question Bank DMC
28 pages
Chapter 2 Data Science (4)
No ratings yet
Chapter 2 Data Science (4)
8 pages
IET - Chapter 2
No ratings yet
IET - Chapter 2
32 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Data Preparation For Data Mining: Shichao Zhang and Chengqi Zhang
No ratings yet
Data Preparation For Data Mining: Shichao Zhang and Chengqi Zhang
8 pages
Data Science - Module 1.3
No ratings yet
Data Science - Module 1.3
34 pages
Big Data A Survey Dinesh
No ratings yet
Big Data A Survey Dinesh
9 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
Unit-2 Introduction To Data Mining
100% (1)
Unit-2 Introduction To Data Mining
11 pages
Paper Dinesh Clustering Techniques
No ratings yet
Paper Dinesh Clustering Techniques
5 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
55 pages
DS PPT Aman
No ratings yet
DS PPT Aman
9 pages
Big Data and Data Science: Case Studies: Priyanka Srivatsa
No ratings yet
Big Data and Data Science: Case Studies: Priyanka Srivatsa
5 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
DW2_f1307a13e4e6f4356f46575291c54ee9(1)
No ratings yet
DW2_f1307a13e4e6f4356f46575291c54ee9(1)
2 pages
VIPDMTheoryChapter3
No ratings yet
VIPDMTheoryChapter3
87 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
BTVN slot1 DBI202
No ratings yet
BTVN slot1 DBI202
2 pages
Foundation of Data Science Solve Question Paper Aug 2022
No ratings yet
Foundation of Data Science Solve Question Paper Aug 2022
7 pages
Intro of Dbms
No ratings yet
Intro of Dbms
20 pages
Big Data
No ratings yet
Big Data
10 pages
Data Structures Complete
50% (2)
Data Structures Complete
255 pages
MCQs - Big Data Analytics - Fundamentals
No ratings yet
MCQs - Big Data Analytics - Fundamentals
14 pages
Emerging Tech Ch 2
No ratings yet
Emerging Tech Ch 2
62 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
No ratings yet
Big Data: How To Handle: A Survey: Dinesh MCA Deptt. PDM University, Bahadurgarh ABC MCA Deptt
8 pages
Week001-Module (1) Merged
No ratings yet
Week001-Module (1) Merged
122 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
DM passing package
No ratings yet
DM passing package
38 pages
Module 1 - Database Systems
No ratings yet
Module 1 - Database Systems
9 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
Experience-Consistent Fuzzy Rule-Based System Modeling: 1. Introduction and Problem Statement
No ratings yet
Experience-Consistent Fuzzy Rule-Based System Modeling: 1. Introduction and Problem Statement
30 pages
Big Data Vs Data Mining: Abstract
No ratings yet
Big Data Vs Data Mining: Abstract
5 pages
CAE1 - 2 - Set1 Key
No ratings yet
CAE1 - 2 - Set1 Key
3 pages
Internship Report 2023-24 Data Science
100% (1)
Internship Report 2023-24 Data Science
23 pages
DMjoy
No ratings yet
DMjoy
9 pages
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Indicator 1 2 3 4 5 Weighte D Mean Interpretation
No ratings yet
Indicator 1 2 3 4 5 Weighte D Mean Interpretation
2 pages
Big Data Analytics in Smart Grids
No ratings yet
Big Data Analytics in Smart Grids
5 pages
BI_Journal
No ratings yet
BI_Journal
46 pages
CEAStudy Guide International
No ratings yet
CEAStudy Guide International
18 pages
Final - Unit 3 Data Preprocessing - Phases
No ratings yet
Final - Unit 3 Data Preprocessing - Phases
42 pages
Purpose: Test For Homogeneity of Variances: Levene 1960
No ratings yet
Purpose: Test For Homogeneity of Variances: Levene 1960
3 pages
Anova (Keller)
No ratings yet
Anova (Keller)
91 pages
Effective Guide To Explaining Graphs in Thesis and Research Papers: Tips and Tools
No ratings yet
Effective Guide To Explaining Graphs in Thesis and Research Papers: Tips and Tools
2 pages
Timetable - Module 02, 03 60th DQM
No ratings yet
Timetable - Module 02, 03 60th DQM
2 pages
Parametric and Nonparametric Test: By: Sai Prakash MBA Insurance Management Pondicherry University
No ratings yet
Parametric and Nonparametric Test: By: Sai Prakash MBA Insurance Management Pondicherry University
10 pages
Advanced Statistics Manual PDF
100% (3)
Advanced Statistics Manual PDF
258 pages
A Study of Behavioral
No ratings yet
A Study of Behavioral
6 pages
Implementation Status of k12 Social Studies Program in Philippine Public Schools
No ratings yet
Implementation Status of k12 Social Studies Program in Philippine Public Schools
18 pages
Machine Learning For Algo Trading
100% (3)
Machine Learning For Algo Trading
29 pages
Cs20ape502 DWDM Syllabus
No ratings yet
Cs20ape502 DWDM Syllabus
3 pages
Abdulhameed A. Ashuja'a Sumaiah M. Almatari Ali S. Alward
No ratings yet
Abdulhameed A. Ashuja'a Sumaiah M. Almatari Ali S. Alward
25 pages
NCDDP AF Sub-Manual - M - E, Aug2021
No ratings yet
NCDDP AF Sub-Manual - M - E, Aug2021
273 pages
Application of SPSS in Research Writing
No ratings yet
Application of SPSS in Research Writing
77 pages
Engineering Statistics Handbook 3. Production Process Characterization
No ratings yet
Engineering Statistics Handbook 3. Production Process Characterization
137 pages
Design and Implementation of Online Student Registration Portal
80% (5)
Design and Implementation of Online Student Registration Portal
57 pages
Exploratory Data Analysis: Prasad Deshmukh
No ratings yet
Exploratory Data Analysis: Prasad Deshmukh
15 pages
The Influence of Work Discipline, Self-Efficacy and Work Environment On Employee Performance in The Building Plant D Department at PT Gajah Tunggal TBK
No ratings yet
The Influence of Work Discipline, Self-Efficacy and Work Environment On Employee Performance in The Building Plant D Department at PT Gajah Tunggal TBK
8 pages
Statistics TA CHP 11 Inference About Population Variance
No ratings yet
Statistics TA CHP 11 Inference About Population Variance
15 pages
Chapter 10: Correlation and Regression Chapter 13: Nonparametric Statistics
No ratings yet
Chapter 10: Correlation and Regression Chapter 13: Nonparametric Statistics
27 pages
"A Study of Financial Statement Analysis Through Ratio Analysis at Sids Farm Pvt. LTD
No ratings yet
"A Study of Financial Statement Analysis Through Ratio Analysis at Sids Farm Pvt. LTD
14 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
Research Skills-Coursera Courses
No ratings yet
Research Skills-Coursera Courses
3 pages
CH-3-Multiple Linear Regression
No ratings yet
CH-3-Multiple Linear Regression
13 pages
Confirmatory Factor Analysis Using AMOS
No ratings yet
Confirmatory Factor Analysis Using AMOS
14 pages
Assignment #2
0% (1)
Assignment #2
3 pages

A Study of Handling Missing Data Methods for Big Data

Uploaded by

A Study of Handling Missing Data Methods for Big Data

Uploaded by

A study of handling missing data methods for big

978-1-5386-4385-3/18/$31.00 ©2018 IEEE 498

1) Functional Dependencies (FD) 2) KNN method

You might also like