0% found this document useful (0 votes)
44 views

Paper 2

The document discusses applying various data mining classification techniques to extract insights from tourism data. Five implementations of three techniques - decision trees, artificial neural networks, and support vector machines - were tested on a tourism dataset. Random Forest performed best before attribute selection at 76% accuracy, while J48 (C4.5) performed best after selecting top attributes based on information gain, at 75% accuracy. Attribute selection improved efficiency for all algorithms, most for multilayer perceptron which saw a 90% improvement. The best performing algorithm was found to be noise tolerant for knowledge discovery in tourism.

Uploaded by

akshay sasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Paper 2

The document discusses applying various data mining classification techniques to extract insights from tourism data. Five implementations of three techniques - decision trees, artificial neural networks, and support vector machines - were tested on a tourism dataset. Random Forest performed best before attribute selection at 76% accuracy, while J48 (C4.5) performed best after selecting top attributes based on information gain, at 75% accuracy. Attribute selection improved efficiency for all algorithms, most for multilayer perceptron which saw a 90% improvement. The best performing algorithm was found to be noise tolerant for knowledge discovery in tourism.

Uploaded by

akshay sasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

World Academy of Science, Engineering and Technology

International Journal of Computer and Information Engineering


Vol:11, No:1, 2017

Application of Data Mining Techniques for Tourism


Knowledge Discovery
Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

 check different scenarios and algorithms for a particular


Abstract—Application of five implementations of three data domain. One of the scenarios used in this paper is to apply the
mining classification techniques was experimented for extracting information gain-based attribute selection method and to build
important insights from tourism data. The aim was to find out the model for the compared algorithms before and after selection
best performing algorithm among the compared ones for tourism
of the attributes. This helps to select or to pick top relevant
knowledge discovery. Knowledge discovery process from data was
used as a process model. 10-fold cross validation method is used for attributes to target attribute for classification. It is also
International Science Index, Computer and Information Engineering Vol:11, No:1, 2017 waset.org/Publication/10006636

testing purpose. Various data preprocessing activities were performed important to find out which algorithm is more noise tolerant
to get the final dataset for model building. Classification models of than the others to show better performance for a given noisy
the selected algorithms were built with different scenarios on the dataset. Therefore, this research is framed in such a way that
preprocessed dataset. The outperformed algorithm tourism dataset classification models are built using the selected algorithms
was Random Forest (76%) before applying information gain based
before and after attribute selection based on information gain
attribute selection and J48 (C4.5) (75%) after selection of top
relevant attributes to the class (target) attribute. In terms of time for to compare the performance of each scenario. The techniques
model building, attribute selection improves the efficiency of all selected for comparison are Decision Tree and Support Vector
algorithms. Artificial Neural Network (multilayer perceptron) Machine (SVM). From decision tree category, C4.5 (J48 in
showed the highest improvement (90%). The rules extracted from the Weka), Random Forest, and Projective Adaptive Resonance
decision tree model are presented, which showed intricate, non-trivial Theory (PART) were experimented. From Artificial Neural
knowledge/insight that would otherwise not be discovered by simple
Network, Multilayer Perceptron (MLP) model is
statistical analysis with mediocre accuracy of the machine using
classification algorithms. experimented. From SVM, Sequential Minimal Optimization
(SMO in Weka) implementation is built. All implementations
Keywords—Classification algorithms; data mining; tourism; were experimented before and after attribute selection based
knowledge discovery. information gain. The algorithm best performance is reported
before and after attribute selection.
I. INTRODUCTION The models were built on the tourism data with the aim of

D ATA mining can be defined as the process of accessing, finding out the noise tolerant classification algorithm to be
selecting, exploring, and modeling large amount of data applied for knowledge discovery in the domain. Several
to uncover previously unknown patterns that are potentially emerging applications in information-providing services call
useful [1]. Classification functionality is a data mining for various data mining techniques to better understand user
functionality that assigns dataset instances in a collection to behavior, to improve the service provided, and to increase
target categories or classes or target attribute values [2]. The business opportunities [6]. Tourism which is one of these
goal of classification technique is to accurately predict the domains is becoming important sector in every nation’s
target class for each case in the dataset. There are different national Growth Domestic Product (GDP) [7]. Tourist arrivals
classification algorithms that would be appropriate for in South Korea averaged 582848 from 1993 until 2016 [8].
particular domain with varying degree of accuracy of their Tourism service is data intensive with complex relationships
performance on particular domain [3], and hence, rigorous of the attributes paralleled with the increasing volume of
research is required for the applicability in each domain. At tourism service [9]. Better decision making and service
the same time, not all attributes (features) in the dataset are improvement needs to be supported by the insights and
equally relevant for the classification purpose. Some of knowledge discovered from the service data or obtained from
attributes just consume the computational resources for some the visitors through survey mechanisms. On the other hand,
of the algorithms. These attributes might be considered as only statistical analysis cannot reveal non-trivial insight as a
noises or overfitting for machine to learn from training dataset result of intricate relationship among the features/attributes in
[4]. There are some algorithms that tolerate noise and these tourism data. Therefore, this complex relation and voluminous
ones are greedier than the others [5]. Thus, it is important to nature of tourism data calls for necessitates use of state of the
art data analytics methods, tools, and techniques beyond
simple statistics analysis [10]. This research aimed at applying
Teklu Urgessa is with the Seoul National University, Graduate School of data mining classification algorithms to explore the potential
Convergence Science and Technology, Suwon, South Korea, Phone: benefits in discovering hidden knowledge from tourism data.
+8226886659; fax: 82-31-888-9148; e-mail: tek2013@ snu.ac.kr).
Wookjae Maeng and Jooong Seek Lee are with the Seoul National It used various algorithms and scenarios to come up with the
University, Graduate School of Convergence Science and Technology, findings as an insight for decision making purpose in the
Suwon, South Korea (e-mail: [email protected], joonlee8@snu. c.kr).

International Scholarly and Scientific Research & Innovation 11(1) 2017 119 scholar.waset.org/1307-6892/10006636
World Academy of Science, Engineering and Technology
International Journal of Computer and Information Engineering
Vol:11, No:1, 2017

domain sector. It is also a typical interdisciplinary research satisfaction and users’ own opinions with the visited places,
that can be a corner stone pertaining to information technology might reveal important insight and hence aimed at including
research in the domain of tourism. The data mining research the whole tourist experience which has something to do with
itself involves machine learning, statistics, computer science, knowing the whole tastes and preferences of tourists. The
information theory, and domain implication working dataset in our case is more comprehensive, and the
knowledge. experiments are rigorous.
We tried to fill the research gap identified through review of
II. RELATED WORKS the related works as acknowledged by other researchers to be
Bach et al. [11] indicated that forecasting, personalization’s explored using data mining. We figured out from the review of
tourism management, tourism systems (such as the related works that there is a research gap where applying
recommendation systems), and machine learning techniques data mining classification to discover important knowledge
such as support vector, regression, multi agent systems, from tourism data is important. Our research is at least new in
particle swarm optimization are the common research areas in one aspect as discussed in this section either in data mining
tourism data mining research. Their finding was based on problem selection, or dataset considered or data source or
keyword analysis of existing researches. Our research is a algorithms employed or tools used for data mining researches
International Science Index, Computer and Information Engineering Vol:11, No:1, 2017 waset.org/Publication/10006636

continuation based on recommendation in the sense that we of the previous works in tourism. Hence, we hope that we
focused on empirical knowledge discovery process from would contribute in finding out the best classification
tourism data using classification algorithms. Olmeda and algorithm and process model for knowledge discovery and
Sheldon [12] have published a paper on “Data Mining shade the light on the potential applicability of data mining
Techniques and Applications for Tourism Internet Marketing”. classification technique in tourism domain.
In their analysis, the potential uses of data mining techniques
and technologies in tourism internet marketing and electronic III. METHODOLOGY
customer relationship management were discussed. According In this section, data source, data description and selection,
to Olmeda and Sheldon, tourism industry provides the data mining process model used for the research, classification
consumer with experiences, and those experiences need framework and data mining tool used for the implementation
increasingly to be customized. In their research, they of classification algorithms are briefly explained.
concluded that data mining techniques can provide tools to
discover insight for customization of user experiences based
on literature survey on data mining research papers on travel
industry data. The paper provided background evidence for
business understanding in our work. However, our work is an
empirical research, not only literature review and to fill the
research gap that they pointed out. Bose [13] has published a
paper entitled “Data Mining in Tourism” with the objective of
researching and discussing the data mining techniques
applicable for tourism using literature review from the
perspective of data mining application in the other domains.
The techniques discussed in this paper is classification
learning, which are the most commonly used machine learning
in most data mining researches in another domain. From
Bose’s paper, we observed the supportive evidence that
classification data mining problem should be researched in
tourism domain as a recommendation of their research based
on the experience of other applications like health care, and
banking.
Aghdam et al. [14] have conducted a research on tourism
entitled “Finding Interesting Places at Malaysia”. The
Fig. 1 Sample screenshot original coded data in CSV format
objective of the study was applying data mining techniques in
tourism data. Data for their research was obtained or crawled A. Data Source, Description, and Preprocessing
from tripadvisor.com. They made quantitative analysis using Tourism survey data were accessed and downloaded in
Weka and qualitative using Nvivo. They used Cross Industry CSV file format from Korean Tourism and Culture website
Standard Process-Data Mining (CRISP-DM). Their paper is [8]. These are open data for research purpose. The dataset
similar in using data mining on actual tourism data and in contained 12030 instances and 134 attributes before selection
using Weka as a machine learning tool, but they used only and preprocessing. After selection and preprocessing, the
dataset related to places and association rule mining alone. We attributes were reduced to 56 attributes. Data features
believe that including data of visitors, that include all tour categories can be summarized as: type of visiting conditions,
related issues like shopping, travel expenses, service

International Scholarly and Scientific Research & Innovation 11(1) 2017 120 scholar.waset.org/1307-6892/10006636
World Academy of Science, Engineering and Technology
International Journal of Computer and Information Engineering
Vol:11, No:1, 2017

socio-demographic attributes, visiting areas, shopping transformation into one step-data processing. The modification
conditions, expenses related to the visit, satisfaction levels of is made for fitting to the practical procedures followed for
tourists with different services while visiting, their personal tourism data mining application in this research.
opinion on their likelihood to recommend Korea to others as a
tourist destination, the opinion of their likelihood to consider TABLE I
COMPARISON BEFORE SELECTION OF ATTRIBUTES
visiting Korea again within three years, the opinion of tourists
Implementations Accuracy (%) Time (in Minutes) Rank by Performance
on the level of change in impression during their current visit.
Random Forest 76 7.36 1
Different data description, selection and preprocessing SMO 75 105.38 2
activities including statistical summary measures, J48(C4.5) 72 2.04 3
visualization, detecting outliers and fixing missing values, MLP 71 181.62 4
discretization, and conceptual hierarchy feature generation PART 69 83.55 5
were performed before modeling the classification models.
Finally, 56 attributes and 12030 records were used for model The data source is determined first, then means of data
building. The original CSV EXCEL date file is obtained from acquisition was sought, and we found that it was free and open
the source mentioned above as in Fig. 1. for research by the institute. There was no challenge regarding
International Science Index, Computer and Information Engineering Vol:11, No:1, 2017 waset.org/Publication/10006636

All the original data are coded. Without defining it, no one data access. However, the entire data were messy, so careful
can say something about it. Therefore, getting the definitions data selection was needed to come up with target data set.
for the codes was necessary. The data definition document and Judgmentally those attributes or features with 10% and more
the questionnaire, with which the tourist’s survey was of its values were missing were excluded, and then, we left
conducted, were available by the owner organization (the with 56 attributes out of 134 attributes. So, our target dataset
Korean Tourism and Culture Institute). Based on the selected for mining is 56. There was further preprocessing on
description document and the questionnaire, the meaning of the target data too. The data preprocessing activities include
each data item is presented as the preliminary business making statistical summary measures for each attribute to see
understanding to the research framework. The description of the distribution of their values, fixing missing values,
the whole attributes and their values are annexed at the end. discretization of large distinct values, e.g. in age attribute.
B. Data Mining Process Model, Tools, and Algorithm Transforming the data was performed using conceptual
Selection hierarchy generations and discretization through binning
The data mining process model selected for this research is methods for continuous attributes like average expenses of
Knowledge Discovery in Database (KDD) Process model visitors, number of day. Changing the data type into the one
[15]. As it is indicated in Fig. 2, it has five steps with each step which can be handled by the mining algorithm and tools like
having its own deliverable. The knowledge discovery process J48 need nominal class labels. The code values 1 to 5 Likert
used in this research has been modified a little bit by scale was considered as the numeric values by Weka, so it
introducing one new step: access to the data source at the very needed to change the data type from numeric to nominal using
beginning, and merging data processing and data the NumericNominal feature of Weka on the fly.

Fig. 2 The process of data mining applications in tourism

The target attribute for classification is opinion of visitors level categories of independent attributes or predictors are
of their likelihood to recommend Korea to others as a tourist presented in Table I.
destination. This target attribute has values (1. very unlikely, The machine learning software selected for this research is
2. unlikely, 3. Neutral, 4. Likely, and 5. Very likely). The rest Weka. Weka is a well-tested and most commonly used open
of the 55 attributes are independent variables (predictors). The source machine learning tool for general purpose data mining
specific attributes or also called features in the dataset include researches [16]. It is a collection of machine learning
nationality, sex, age, education, touring condition, number of algorithms for solving real world data mining problems. It is
visits in Korea, number of companion, expenses, places written in Java and runs on almost any platform [17].
visited, shopping places, shopped items, and so on. The high Classification algorithms selected for comparison in this

International Scholarly and Scientific Research & Innovation 11(1) 2017 121 scholar.waset.org/1307-6892/10006636
World Academy of Science, Engineering and Technology
International Journal of Computer and Information Engineering
Vol:11, No:1, 2017

research are C4.5 (J48 in Weka), Random Forest, SVM (SMO PRC areas. In terms of the weighted average recall, MLP and
in Weka), PART, and MLP as mentioned in the introduction J48 outperformed others with (75.3% and 75.2%,
part. These algorithms are described in section III in detail respectively).
with examples and visual illustrations.
After final dataset was ready for model building, five
algorithms of classification mining were built both before and
after information gain attribute selection and after selection
and performance of each algorithm in each scenario is
compared. 10-fold cross validation was used for testing.
Accuracy measures based on confusion matrix were reported
in terms of correctly classified instances, and other detailed
accuracy measures like Recall, Precision, F-measure, ROC
area, PRO area, etc.
Finally, the sample tree structure is illustrated and same
sample rules from decision tree rules were extracted as a
International Science Index, Computer and Information Engineering Vol:11, No:1, 2017 waset.org/Publication/10006636

showcase of knowledge discovered, which otherwise would


not have been possible without data mining.

IV. EXPERIMENTS AND RESULTS Fig. 3 Comparison of algorithms by detailed accuracy measures
As stated in the introduction and methodology sections, the before attribute selection
aim of this research is to compare the performance of the
selected classification algorithms implementations in Weka on It is visible from Fig. 4 that Random Forest is higher in
both the entire data dataset and the selected top 10 relevant ROC curve than others in the models built on the entire final
attributes based on information gain. Therefore, in this section, data sets. It is of course in every aspect that Random Forest
summary of the classification algorithms models and their outperformed the other models.
analysis results are presented in Tables I and II. To see whether selecting attributes with more information
gains affect the performance algorithms or not, top ten
TABLE II attributes were selected. Those models were rebuilt with
TOP 10 RELEVANT ATTRIBUTES BASED ON INFORMATION GAIN similar properties with the selected attributes. The top ten
S Attribute code selected attributes are as presented in Table II. The top
Meaning of the code Rank
No original data
attribute is the opinion of the visitors on the likelihood to visit
the interest of the visitor to visit Korea again
1 q19 1 Korea again. The second ranked attribute is the overall
within the coming three years
2 q16b the overall satisfaction level of a visitor 2 satisfaction level of the visitors.
3 q21 change of impression during the current visit 3
TABLE III
4 q1606 satisfaction level by appeal of tourist spot 4
COMPARISON ALGORITHMS AFTER ATTRIBUTE SELECTION
satisfaction level of a tourist with tourist
5 q1607 5 Rank by
information services Algorithms Performance (%) Time (in minutes)
Performance
6 q1604 satisfaction level of a tourist with food 6
J48(C4.5) 75 1 1
satisfaction level of a tourist with a security
7 q1610 7 Random Forest 74 5 2
services
satisfaction level of a tourist with shopping SMO 74 3.68 2
8 q1605 8 PART 73 6 4
service
satisfaction level of a tourist with public MLP 73 17.71 4
9 q1602 9
transportation service
satisfaction level of a tourist with travel
10 q1609
expense
10 The performance result of the algorithms after attribute
tendency to recommend Korea to others as a selection is presented in Table III. From the table, it is clear
11 q20
tourist destination that attribute selection based on information gain improved the
performance of C4.5 (J48) from 72% to 75%, PART from
As it can be seen from Table I, Random Forest outperforms 69% to 73% and MLP from 71% to 73% but degrade the
(76%) the rest on the entire attributes. SMO is the second best performance of Random forest from 76% to 74% and SMO
in scenario with 75%. In terms of training time, C4.5 (J48) is from 75% to 74%. This indicates that entropy based
the fastest algorithm, and ANN (Multilayer perceptron) is the information gain attribute ranking does not help to improve
slowest to build the model. the performance of Random Forest and SVM (SMO). In terms
The detailed accuracy measures for these models before of training time, all of the implementations showed decrease
attribute selection is presented in Fig. 3 Even though the but for MLP, it is big (90%). This shows that Random Forest
difference in other measures is a little bit blared, the difference and SMO perform better respectively in rank on noisy data
in the ROC area and PRC area is more visible. That Random than the others in the comparison.
Forest is at the top and followed by J48 in terms of ROC and The detailed accuracy of the selected implementations after

International Scholarly and Scientific Research & Innovation 11(1) 2017 122 scholar.waset.org/1307-6892/10006636
World Academy of Science, Engineering and Technology
International Journal of Computer and Information Engineering
Vol:11, No:1, 2017

attribute selection is presented in Fig. 4 In terms of average “feeling better” THEN it is “likely” that a particular
weighted precision and recall, random forest and SMO have visitor recommends Korea as a tourist destination.
higher scores respectively. In terms of ROC and PRC area  Rule#5: IF interest of a visitor to visit Korea within the
values, random forest and MLP outperformed others, even coming three years (q19) is “Very likely” and IF
though random forest was much higher, as first and second, impression of the visitor during current visit (q21) is
respectively, with values of 0.83 and 0.82 for ROC area and “unlikely” THEN it is “unlikely” that a particular visitor
0.765 and 0.74 for PRC area. The least in these measures is would recommend to others.
PART. Knowledge/insight hidden within the dataset, which To see more specific rules, only at satisfaction level, visited
cannot be discovered using simple statistical analysis, was area and demographic information is used to construct the
revealed. decision tree using J48 algorithm. The following rules were
The limitation of this study is that the rules were not extracted as a sample:
evaluated by experts in the domain. In practical sense, rules  Rule #6: IF a visitor is “very satisfied” with security
should be evaluated by the experts in the domain for decision service and IF he/she is “very satisfied with travel
making purposes. expenses THEN he/she would ‘very likely’ to recommend
Korea as a tourist destination.
International Science Index, Computer and Information Engineering Vol:11, No:1, 2017 waset.org/Publication/10006636

 Rule #7: IF a visitor is “very satisfied” with security


service and IF he/she is “satisfied” with travel expenses
THEN he/she would be ‘Neutral’ to recommend Korea as
a tourist destination.
 Rule #8: IF a visitor is “very satisfied” with security
service and IF he/she is “neutral” with travel expenses and
he/she is “satisfied” with the public transport THEN it is
“likely” that he/she recommends Korea as a tourist
destination to others.
 Rule #9: IF a visitor is “very satisfied” with security
service and IF he/she is “neutral” with travel expenses and
IF he/she is “Neutral” with the public transport THEN
he/she would be “neutral” to recommend Korea as a
tourist destination to others.
 Rule #10: IF a visitor is “satisfied” with security service
Fig. 4 Detailed accuracy of the algorithms after attribute selection and IF he/she is “Japanese” and IF he/she is “satisfied’
with ‘Shopping” THEN he/she would be “neutral” to
Let us see some of the rule generated from this decision tree recommend Korea as a tourist destination to others.
based on the principle of decision tree interpretation using IF  Rule#11: IF a visitor is “satisfied” with Security and IF
condition THEN outcome starting from the root node to the he/she is “Japanese” and IF he/she is “unsatisfied” with
decision leaf. shopping THEN it is “unlikely” that he/she would
 Rule#1: IF interest of a visitor to visit Korea within the recommend Korea as a tourist destination to others.
coming three years (q19) is “Highly likely” and IF  Rule#12: IF a visitor is “satisfied” with Security and IF
impression of the visitor during current visit(q21) is he/she is from “Taiwan” and IF he/she is “satisfied” with
“feeling much better” THEN it is “highly likely” that a communication THEN it is “likely” that she/he would
particular visitor recommends Korea as a tourist recommend Korea as a tourist destination to others.
destination
 Rule#2: IF interest of a visitor to visit Korea within the V. CONCLUSION
coming three years (q19) is “likely” THEN it is “likely” This research showed a clear research method for applying
that a particular visitor recommends Korea as a tourist classification algorithm for tourism knowledge discovery on
destination. comprehensive survey data. Related works were discussed and
 Rule#3: IF interest of a visitor to visit Korea within the research gap is pointed out procedurally. Data mining process
coming three years (q19) is “Very likely” and IF model for tourism is proposed. The best performing algorithm
impression of the visitor during current visit (q21) is is identified from the experiment. Top important factors that
“Neutral” and IF the over satisfaction with services determine the opinion of visitors on their likelihood of
received is “very satisfactory” THEN it is “highly likely” recommending Korea as a tourist destination were identified
that a particular visitor recommends Korea as a tourist as presented in Table II. Five models were built using the
destination. selected classification algorithms; namely, J48, Random
 Rule#4: IF interest of a visitor to visit Korea within the Forest, PART, SMO, and MLP. The experimental result
coming three years (q19) is “Highly likely” and IF showed that Random Forest and SVM (SMO) respectively are
impression of the visitor during current visit (q21) is more noise tolerant than the other algorithms as it showed
better performance on entire attributes respectively. After, the

International Scholarly and Scientific Research & Innovation 11(1) 2017 123 scholar.waset.org/1307-6892/10006636
World Academy of Science, Engineering and Technology
International Journal of Computer and Information Engineering
Vol:11, No:1, 2017

entropy based attribute selection performance of J48 is knowledge discovery techniques. In Industrial Automation, Information
and Communications Technology (IAICT), 2014 International
improved, while that of random forest is degraded. The Conference on (pp. 130-134). IEEE.
research clearly showed that it possible to extract useful [15] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD
insights from tourism data with fair level of performance. process for extracting useful knowledge from volumes of data.
Communications of the ACM, 39(11), 27-34.
Decision tree interpretation is presented as indicator of the [16] Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA
hidden knowledge that can be discovered using data mining Workbench. Online Appendix for "Data Mining: Practical Machine
techniques. Those rules can be used as a base for decision Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition,
making to target the service demands of tourists. The research 2016
[17] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., &
findings can be taken as important base for the further Witten, I. H. (2009). The WEKA data mining software: an update. ACM
research in the area to apply more techniques like association SIGKDD explorations newsletter, 11(1), 10-18.
rule mining on tourism data. In terms of training time, MLP is
the slowest in both cases before and after attribute selection
though the improvement in training time for MLP is above
90% after attribute selection. So, MLP needs careful attribute
selection for better time efficiency. However, selection of top
International Science Index, Computer and Information Engineering Vol:11, No:1, 2017 waset.org/Publication/10006636

relevant attributes reduced computational time for all


algorithms overall focusing on high level insights. The future
work would be applying association rule mining to see any
linkage that exists within the dataset, which could be used as a
base for building recommendation systems and possibility of
service improvement to visitors’ expectation. It is believed
that the discovered insights will enable the service provider to
give priority to what is most important to the visitors based on
their demographics and tourism preferences.

REFERENCES
[1] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and
techniques. Elsevier.
[2] Glusman, G., Bahar, A., Sharon, D., Pilpel, Y., White, J., & Lancet, D.
(2000). The olfactory receptor gene superfamily: data mining,
classification, and nomenclature. Mammalian genome, 11(11), 1016-
1023.
[3] Wu, Xindong, et al. (2008). "Top 10 algorithms in data mining."
Knowledge and information systems 14.1: 1-37.
[4] Hong, T. P., Kuo, C. S., & Chi, S. C. (2001). Trade-off between
computation time and number of rules for fuzzy mining from
quantitative data. International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 9(05), 587-604.
[5] Caruana, R., & Freitag, D. (1994, July). Greedy Attribute Selection. In
ICML (pp. 28-36).
[6] Gupta, A. K., & Gupta, C. (2012). Analyzing Customer Behavior using
Data Mining Techniques: Optimizing Relationships with Customer.
Management Insight, 6(1).
[7] Rodríguez, I., Williams, A. M., & Hall, C. M. (2014). Tourism
innovation policy: Implementation and outcomes. Annals of Tourism
Research, 49, 76-93.
[8] South Korea Tourist Arrivals 1993-2016 available at
http://www.tradingeconomics.com/south-korea/tourist-arrivals accessed
on 2016.08.09
[9] OECD Tourism Trends and Policies 2014 accessed
http://www.keepeek.com/Digital-Asset-Management/oecd/industry-and-
services/oecd-tourism-trends-and-policies-2014_tour-2014-en#page1 on
2016.08.10
[10] Sabou, M., Onder, I., Brasoveanu, A. M., & Scharl, A. (2016). Towards
cross-domain data analytics in tourism: a linked data based approach.
Information Technology & Tourism, 16(1), 71-101.
[11] Bach, M. P. (2003, June). Data mining applications in public
organizations. In Proceedings of the 25th international conference on
information technology interfaces (pp. 211-216).
[12] Olmeda, I., & Sheldon, P. J. (2002). Data mining techniques and
applications for tourism Internet marketing. Journal of Travel &
Tourism Marketing, 11(2-3), 1-20
[13] Bose, I. (2009). Data Mining in Tourism. Encyclopedia of Information
Science And Technology.
[14] Aghdam, A. R., Kamalpour, M., Chen, D., Sim, A. T. H., & Hee, J. M.
(2014, August). Identifying places of interest for tourists using

International Scholarly and Scientific Research & Innovation 11(1) 2017 124 scholar.waset.org/1307-6892/10006636

You might also like