Bda Course Project Report
Bda Course Project Report
By
Vipul Bhoir(Roll No.07)
Mrudul Chaudhari(Roll No. 12)
Abhinav Desai(Roll No. 14)
Supervisor
Mrs. Sneha Mhatre
University of Mumbai
(2024-25)
Vidyavardhini's College of Engineering & Technology
Department of Computer Engineering
CERTIFICATE
This is to certify that the project entitled “Tourist Behavior Analysis” is a bonafide work
of "Vipul Bhoir(Roll No.07), Mrudul Chaudhari(Roll No. 12), Abhinav Desai(Roll No.
14)” submitted to the University of Mumbai in partial fulfillment of the requirement for
the Course project in semester VII of Fourth Year Computer Engineering.
.
Supervisor
1.1 Introduction
1.2 Problem Statement
1.3 Scope of Project
Chapter 2: Requirement Analysis 2
4.1 Methodology
4.2 Sample Module
4.3 Code
Chapter 5: Results 24
5.1 Results
5.2 Conclusion
References 25
1 Introduction
1.1 Introduction
Tourist behavior analysis is the study of the motivations, decisions, and actions of
tourists. It is a complex field that encompasses a wide range of factors, including
psychology, sociology, economics, and geography. By understanding tourist
behavior, tourism businesses and policymakers can better develop and promote
products and services that meet the needs of tourists.
1
2. Requirement Analysis
2.1 Software Requirements:
The system will require the following software:
R
RStudio
Tidyverse libraries
Other relevant libraries (e.g., ggplot2, caret, etc.)
2.2 Hardware Requirements
The system will require the following hardware:
A computer with at least 4GB of RAM and 100GB of free disk space
An internet connection
Recommended:
- 16 GB RAM
Minimum:
- 8 GB RAM
Performance: The system should be able to handle large datasets and complex
queries efficiently. The system should be able to generate results in a reasonable
amount of time. The system should be able to handle concurrent users without
impacting performance.
Security: The system should be secure from unauthorized access, modification, or
destruction of data. The system should protect user privacy. The system should be
compliant with all relevant security regulations.
2
Reliability: The system should be highly available and reliable. The system should
be able to recover from failures quickly and minimize downtime. The system
should be regularly monitored and backed up.
Usability: The system should be easy to use and navigate. The system should be
well-documented. The system should be accessible to users with disabilities.
Scalability: The system should be scalable to handle increasing data volumes and
user loads. The system should be modular and designed to support future growth.
The system should be deployed in a cloud environment to facilitate scalability.
Maintainability: The system should be well-designed and organized, making it
easy to maintain and update. The system should be documented with code
comments and documentation. The system should be tested regularly to ensure that
it is working properly.
Portability: The system should be portable and able to run on a variety of
platforms.
Extensibility: The system should be extensible, allowing for new features and
functionality to be added easily.
Interoperability: The system should be interoperable with other systems, such as
CRM and ERP systems.
3
3. System Design
3.1 System Design:
The system will be designed as a modular system, with each module responsible
for a specific task.
The following are the main modules of the system:
Data collection and cleaning module: This module will collect data from a variety
of sources and clean it to a consistent format.
Data analysis module: This module will identify patterns and trends in the data,
and predict tourist behavior based on past trends and current events.
Visualization module: This module will visualize the results of the analysis in a
clear and concise way.
Prediction module: This module will analyze the pattern and forecast the tourist
increase in the years for the monuments.
3.2 Diagram
4
3.2 Module Description:
Data collection and cleaning module: This module will collect data from a
variety of sources, including social media data, travel surveys, and booking
records. The data will then be cleaned to a consistent format so that it can be
easily analyzed.
Data analysis module: This module will use a variety of statistical and machine
learning techniques to identify patterns and trends in the data. The module will
also be able to predict tourist behavior based on past trends and current events.
Visualization module: This module will visualize the results of the analysis in a
clear and concise way. The module will generate a variety of charts and graphs
that can be used to understand the findings of the analysis
Prediction module: This module will analyze the pattern and forecast the tourist
increase in the years for the monuments.
5
4. Implementation
4.1 Methodology
1. Data collection
The first step is to collect data from a variety of sources. This may include social
media data, travel surveys, booking records, and other relevant data sources. The
data should be cleaned and preprocessed to ensure that it is in a consistent format
and that any errors or missing values are addressed.
2. Data analysis
Once the data is prepared, you can begin to analyze it using a variety of statistical
and machine learning techniques. This may involve identifying patterns and trends
in the data, such as the most popular tourist destinations, the most popular activities,
and the spending habits of tourists. You can also use data analysis to predict tourist
behavior, such as the likelihood of a tourist visiting a particular destination or
participating in a particular activity.
3. Visualization
Once you have analyzed the data, you can use visualization tools to present the
results in a clear and concise way. This may involve creating charts, graphs, and
maps that illustrate the patterns and trends that you have identified. Visualization
can also be used to communicate the findings of your analysis to a variety of
stakeholders, such as tourism businesses, policymakers, and researchers.
4. Deployment
Once you have developed and tested your system, you can deploy it to production.
This may involve making the system available to users over the web or through a
mobile app. You may also need to develop and implement maintenance and support
procedures for the system.
6
Here are some additional details about each step:
Data collection:
When collecting data, it is important to consider the following:
Data sources: There are a variety of data sources that can be used for tourist
behavior analysis.
Some common data sources include:
Social media data: Social media data can be used to track tourist movements,
identify popular tourist destinations, and understand tourist sentiment.
Travel surveys: Travel surveys can be used to collect data on tourist demographics,
travel motivations, and spending habits.
Booking records: Booking records can be used to track tourist itineraries, identify
popular activities, and understand tourist spending.
Data sampling: Data sampling is the process of selecting a subset of data from a
larger population. This can be useful for reducing the cost and complexity of data
collection.
Data cleaning: Data cleaning is the process of identifying and correcting errors
and inconsistencies in data. This is an important step in preparing data for analysis.
Data analysis:
There are a variety of statistical and machine learning techniques that can be
used for tourist behavior analysis. Some common techniques include:
Descriptive statistics: Descriptive statistics can be used to summarize the data and
identify patterns and trends.
Multivariate analysis: Multivariate analysis can be used to identify relationships
between multiple variables.
Machine learning: Machine learning can be used to develop models that can
predict tourist behavior.
7
Visualization:
There are a variety of visualization tools that can be used to present the results
of your analysis. Some common visualization tools include:
Charting tools: Charting tools can be used to create a variety of charts, such as bar
charts, line charts, and pie charts.
Graphing tools: Graphing tools can be used to create a variety of graphs, such as
scatter plots and histograms.
Mapping tools: Mapping tools can be used to create maps that show the
distribution of tourists or other relevant data.
Deployment:
When deploying your system, you need to consider the following:
System architecture: The system architecture should be designed to support the
performance, security, and reliability requirements of the system.
User interface: The user interface should be designed to be easy to use and
navigate.
Security: The system should be deployed in a secure environment and should be
protected from unauthorized access.
Maintenance and support: You should develop and implement maintenance and
support procedures for the system.
2. Data analysis
Once the data is prepared, you can begin to analyze it using a variety of statistical
and machine learning techniques. This may involve identifying patterns and trends
in the data, such as the most popular tourist destinations, the most popular activities,
8
and the spending habits of tourists. You can also use data analysis to predict tourist
9
behavior, such as the likelihood of a tourist visiting a particular destination or
participating in a particular activity.
3. Visualization
Once you have analyzed the data, you can use visualization tools to present the
results in a clear and concise way. This may involve creating charts, graphs, and
maps that illustrate the patterns and trends that you have identified. Visualization
can also be used to communicate the findings of your analysis to a variety of
stakeholders, such as tourism businesses, policymakers, and researchers.
4. Deployment
Once you have developed and tested your system, you can deploy it to production.
This may involve making the system available to users over the web or through a
mobile app. You may also need to develop and implement maintenance and support
procedures for the system.
4.3 Code
getwd()
#get data
data2 <- read.csv("india tour growth dataset.csv")
#data analysis
head(data2)
summary(data2)
str(data2)
#Data Cleaning
any(is.na(data2))
data2 <- na.omit(data2)
10
#Calculating Growth
11
data2$GrowthDomestic <- ((data2$Domestic.2020.21 -
data2$Domestic.2019.20) / data2$Domestic.2019.20) * 100
data2$GrowthForeign <- ((data2$Foreign.2020.21 -
data2$Foreign.2019.20) / data2$Foreign.2019.20) * 100
#Calculating Averages
average_growth_domestic <- mean(data2$GrowthDomestic, na.rm =
TRUE)
average_growth_foreign <- mean(data2$GrowthForeign, na.rm =
TRUE)
cat("Average Domestic Growth:", average_growth_domestic, "\n")
cat("Average Foreign Growth:", average_growth_foreign, "\n")
#Display of Bar-Plot
library(ggplot2)
# Create a data frame for plotting
avg_growth_data <-
data.frame( Category = c("Domestic",
"Foreign"),
AverageGrowth = c(average_growth_domestic,
average_growth_foreign)
)
12
y = "Average Growth (%)")
13
print(bar_plot)
#Heat-map
library(reshape2)
data_20 <- data2[1:20, ]
# Create a heatmap
heatmap_plot <- ggplot(melted_data, aes(x = variable, y = Monument,
fill = value)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "blue") +
labs(
x = "Monument",
y = "Year",
fill = "Value"
)+
theme_minimal() +
14
theme(axis.text.x = element_text(angle = 45, hjust = 1))
15
print(heatmap_plot)
# Line Graph
ggplot(data = data_20, aes(x = Circle)) +
geom_line(aes(y = Domestic.2019.20, color = "Domestic 2019-20"),
size = 1) +
geom_line(aes(y = Foreign.2019.20, color = "Foreign 2019-20"),
size = 1) +
geom_line(aes(y = Domestic.2020.21, color = "Domestic 2020-21"),
size = 1) +
geom_line(aes(y = Foreign.2020.21, color = "Foreign 2020-21"),
size = 1) +
xlab("City") +
ylab("Number of Visitors") +
labs(color = "Visitor Type") +
theme_minimal() +
theme(legend.position = "top") +
scale_color_manual(values = c("Domestic 2019-20" = "blue",
"Foreign 2019-20" = "red",
"Domestic 2020-21" = "green",
"Foreign 2020-21" = "purple")) +
ggtitle("Monument Visitors Growth Over the Years")
library(leaflet)
latlong <- read.csv("lonandlat2.csv")
mymap <- leaflet(data = latlong) %>%
16
addTiles()
17
mymap <- mymap %>%
addMarkers(
lng = ~Longitude,
lat = ~Latitude,
popup = ~paste("City:", City, "<br>Monument:",
`Name.of.the.Monument`)
)
mymap
18
future_years <- ts(2021:2022, frequency = 1)
19
# Predict foreign tourists for the next 2 years based on the model
predicted_growth <- predict(foreign_model, newdata =
data.frame(time = time(future_years)))
# Plot the historical and predicted values
# Create a sample dataset
plot(data_for_pred$Foreign_2019_20, type = "o", xlab = "Row no",
ylab = "Foreign Tourist Arrivals", col = "blue", main = "Foreign
Tourist Growth Prediction")
lines(data_for_pred$Foreign_2020_21, type = "o", col = "red")
lines(predicted_growth, type = "o", col = "green")
# Add a legend
legend("topright", legend = c("2019-20", "2020-21", "Predicted 2021-
22", "Predicted 2022-23"), col = c("blue", "red", "green"), lty = 1, cex
= 0.8)
# Add a legend
legend("topright", legend = data_for_pred$Name_of_the_Monument,
col = 1:nrow(data_transposed), lty = 1, cex = 0.8)
20
4.4 Output:
21
22
23
5. Results:
5.1 Results:
The conclusions of the project should be based on the results of the data analysis.
The conclusions should be clear, concise, and actionable. They should also be relevant to
the needs of the stakeholders for whom the project is being conducted.
For example, if the project is being conducted for a tourism business, the conclusions may
focus on identifying ways to attract more tourists or to increase the spending of tourists. If
the project is being conducted for a policymaker, the conclusions may focus on ways to
promote sustainable tourism or to mitigate the negative impacts of tourism.
The conclusions of the project should also be based on the limitations of the data and the
analysis. For example, if the data is not representative of all tourists, then the conclusions
should be limited accordingly.
Overall, the tourist behavior analysis project can provide valuable insights into the needs
and wants of tourists. This information can be used to develop and promote products and
services that meet those needs, and to develop policies that promote sustainable tourism
and tourism development.
5.2 Conclusion:
The results of the tourist behavior analysis project can be used to better understand
the needs and wants of tourists, to develop and promote products and services that meet
those needs, and to develop policies that promote sustainable tourism and tourism
development.
Some specific results that may be obtained from the project include:
Identification of the most popular tourist destinations and activities
Understanding of tourist demographics, travel motivations, and spending habits
Prediction of tourist behavior, such as the likelihood of a tourist visiting a particular
destination or participating in a particular activity
Identification of trends in tourist behavior over time
24
References:
[1] D. -D. Lu and Y. -D. Zhong, "A tourist flows analysis system based on phone big data,"
2016 IEEE International Conference on Big Data Analysis (ICBDA), Hangzhou, China,
2016, pp. 1-5, doi: 10.1109/ICBDA.2016.7509822.
[2] S. Arthan, K. Jandum and K. Tamee, "Exploring Tourist Behavior from Social Media
Using Geotagged Photographs," 2021 Joint International Conference on Digital Arts,
Media and Technology with ECTI Northern Section Conference on Electrical, Electronics,
Computer and Telecommunication Engineering, Cha-am, Thailand, 2021, pp. 285-288,
doi: 10.1109/ECTIDAMTNCON51128.2021.9425761.
25