0% found this document useful (0 votes)

428 views

HMPE 201 Final Module 2 1

This document provides an overview of a course titled "Data Analytics in the Hospitality Industry". The course aims to enable students to extract meaningful information from hospitality data to help hospitality enterprises succeed. Topics covered include extracting, managing, analyzing, visualizing, and forecasting customer and hospitality data. The course also discusses applying data analytics knowledge to decision making. It is a 4-credit, semester-long course taught in a modular format to College of Hospitality and Tourism Management students.

Uploaded by

Venice Espinoza

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

428 views

HMPE 201 Final Module 2 1

Uploaded by

Venice Espinoza

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

College of Hospitality and Tourism Management

Course Overview

Course No. HMPE 201

Course Code HMPE 201
Descriptive Title DATA ANALYTICS IN THE HOSPITALITY INDUSTRY
Credit Units 4
School Year / Term 2nd semester SY 2020-2021
Mode of Delivery Modular
Name of DR. DINAH F. CATAMCO, JOMARIE C. SALAR
Instructor/Professor
Course Description This course enables a student to extract meaningful information from
hospitality data, to better position the hospitality enterprise for success in
the marketplace.
Course Outcomes  Extract meaningful information from hospitality data
 Develop knowledge in managing useful data
 Develop appreciation in handling useful data into useful
information that would help the hospitality enterprise
succeed in the marketplace
 Analyse and visualize customer and hospitality data
 Forecast demand in the hospitality industry
 Discuss the importance of data analytics in the hospitality
industry
 Apply useful knowledge from hospitality data in decision
making.
SLSU Vision A high quality corporate science and technology University
SLSU Mission SLSU will produce science and technology leaders and competitive
professionals, generate breakthrough research in science and
technology based disciplines, transform and improve the quality of
life in the communities in the service area, be self – sufficient and
financially viable.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Module Guide
How to navigate this module

Hi, welcome to this module “The Basics of Data Analytics”. This module discusses the
different preparation of Beverage Products which comprise the following topics:

1. The Data
2. Visualization of data
3. Data pre-processing
Upon reading this module and answering the assessment provided to you, you will be to:

1. Determine various types of data, its characteristic, components, attributes and their
relationship
2. Define what is data visualization
3. Elucidate how data visualization generate useful information thru using various
techniques
4. Explain the different steps of data preprocessing.

All the learnings that you will acquired in this module is significant in completing all the
laboratory activities on the laboratory guide attached in this module.
The module made use of illustrative examples and visualize graphics for you to easily
understand the topics. The references used for this are the research output published on some
reputable research sites, published books and e-books, and learning materials related to Food
and beverage service operations.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

LESSON 1

THE DATA

Intended Learning Outcome

At the end of this lesson, you will be able to:

1. Describe various types of data.

2. Explain the different data attributes.
3. Identify data characteristics and components.

Now get started

Introduction

Data analytics is the science of analysing raw datasets in order to derive a conclusion
regarding the information they hold. It enables us to discover patterns in the raw data and draw
valuable information from them. Data analytics processes and techniques may use applications
incorporating machine learning algorithms, simulation, and automated systems. The systems and
algorithms work on the unstructured data for human use. These findings are interpreted and used
to help organizations understand their clients better, analyse their promotional campaigns,
customize content, create content strategies, and develop products. Data analytics help
organizations to maximize market efficiency and improve their earnings.
.
_______________________________________________________
Keywords
Database system, data warehouse, Data objects
Data attributes, patterns, association, correlation
_______________________________________________________

Let’s Learn

When this Data has so much importance in our life then it becomes important to properly
store and process this without any error. When dealing with datasets, the category of data plays
an important role to determine which preprocessing strategy would work for a particular set to
get the right results or which type of statistical analysis should be applied for the best results.
Let’s dive into some of the commonly used categories of data.

Database System

An information base framework, additionally called a data set administration framework

(DBMS), comprises of an assortment of interrelated information, known as a data set, and a bunch of
programming projects to oversee and get to the information. The product programs give instruments
to characterizing data set constructions and information stockpiling; for indicating and overseeing
simultaneous, shared, or dispersed information access; and for guaranteeing consistency and security
of the data put away in spite of framework crashes or endeavors at unapproved access. A social data

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

set is an assortment of tables, every one of which is allotted an exceptional name. Each table comprises
of a bunch of traits (segments or fields) and for the most part stores an enormous arrangement of
tuples (records or lines). Each tuple in a social table addresses an item distinguished by a special key
and portrayed by a bunch of trait esteems (Han, Kamber & Pei, 2012).

Data Warehouse

A data warehouse is a large collection of business data used to help an organization make
decisions. The concept of the data warehouse has existed since the 1980s, when it was developed
to help transition data from merely powering operations to fuelling decision support systems that
reveal business intelligence. The large amount of data in data warehouses comes from different
places such as internal applications such as marketing, sales, and finance; customer-facing apps;
and external partner systems, among others.

On a technical level, a data warehouse periodically pulls data from those apps and
systems; then, the data goes through formatting and import processes to match the data already
in the warehouse. The data warehouse stores this processed data so it’s ready for decision makers
to access. How frequently data pulls occur, or how data is formatted, etc., will vary depending on
the needs of the organization.

Framework of Data Warehouse

Source: https://copycoding.com/datawarehouse/architecture.html

Data Objects and Attribute Types

Data sets are made up of data objects. A data object represents an entity—in a sales
database, the objects may be customers, store items, and sales; in a medical database, the objects
may be patients; in a university database, the objects may be students, professors, and courses.
Data objects are typically described by attributes. Data objects can also be referred to as samples,
examples, instances, data points, or objects. If the data objects are stored in a database, they are
data tuples. That is, the rows of a database correspond to the data objects, and the columns
correspond to the attributes. In this section, we define attributes and look at the various attribute
types.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Data Attributes

An attribute is a data field, representing a characteristic or feature of a data object. The

nouns attribute, dimension, feature, and variable are often used interchangeably in the literature.
The term dimension is commonly used in data warehousing. Machine learning literature tends to
use the term feature, while statisticians prefer the term variable. Data mining and database
professionals commonly use the term attribute, and we do here as well. Attributes describing a
customer object can include, for example, customer ID, name, and address. Observed values for a
given attribute are known as observations. A set of attributes used to describe a given object is
called an attribute vector (or feature vector). The distribution of data involving one attribute (or
variable) is called univariate. A bivariate distribution involves two attributes, and so on.

The type of an attribute is determined by the set of possible values—nominal, binary,

ordinal, or numeric—the attribute can have. In the following subsections, we introduce each type.

 Nominal Attribute
Nominal means “relating to names.” The values of a nominal attribute are
symbols or names of things. Each value represents some kind of category,
code, or state, and so nominal attributes are also referred to as categorical.
The values do not have any meaningful order.

Example: Marital Status, Country, Gender, Race, Hair Colour

 Ordinal Attribute
An ordinal attribute is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude between
successive values is not known.

Example: Grade, Educational Level, Satisfaction Level, Socio-economic

status, Income

 Numeric attribute
A numeric attribute is quantitative; that is, it is a measurable quantity,
represented in integer or real values. Numeric attributes can be interval-
scaled or ratio-scaled.
 Interval-Scaled Attributes
Interval-scaled attributes are measured on a scale of
equal-size units. The values of interval-scaled attributes
have order and can be positive, 0, or negative. Thus, in
addition to providing a ranking of values, such attributes
allow us to compare and quantify the difference between
values.

 Ratio-Scaled Attributes
A ratio-scaled attribute is a numeric attribute with an
inherent zero-point. That is, if a measurement is ratio-scaled,
we can speak of a value as being a multiple (or ratio) of
another value. In addition, the values are ordered, and we can
also compute the difference between values, as well as the
mean, median, and mode.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Patterns, Association and Correlation

In organizing data, there are pattern or trend can be formed and drawn out from the
organized data.

Frequent patterns, as the name suggests, are patterns that occur frequently in data. There
are many kinds of frequent patterns, including frequent item sets, frequent subsequence’s (also
known as sequential patterns), and frequent substructures. A frequent item set typically refers to
a set of items that often appear together in a transactional data set—for example, milk and bread,
which are frequently bought together in grocery stores by many customers. A frequently
occurring subsequence, such as the pattern that customers, tend to purchase first a laptop,
followed by a digital camera, and then a memory card, is a (frequent) sequential pattern. A
substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with item sets or subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. Mining frequent patterns leads to the discovery of interesting
associations and correlations within data.

Think what is the

picture says!

Association of Data

According to IBM a data association is a user-defined grouping of related groups and

elements. It can consist of one or more groups along with some or all of the elements within
those groups.

Correlation of data
This means that data moves in coordination with another.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Let’s sum up

A data warehouse is a large collection of business data used to help an

organization make decisions.
Nominal means “relating to names.”
Data sets are made up of data objects
An information base framework, additionally called a data set administration
framework (DBMS)

Let’s assess what have you learned in this lesson

Learning Assessment Task 2.1.1

Identification. Students will be given with 10 items identification covering all the topics under
this lesson. They will rated based on their correct answer.
Instruction: Identify the answer on the scrambled words in the box. Please write your identified
answer on the spaces provided before the number.
Scrambled Words

_________________________ 1. The values of this attributes is symbol or name.

_________________________ 2. An attribute with inherent zero point.
_________________________ 3. A large collection of data.
_________________________ 4. Also known as data trend.
_________________________ 5. Data moves in coordination with another.
_________________________ 6. An attribute where data are measured in equal scale.
_________________________ 7. An attribute with possible values that have a meaningful order
________________________ 8. A data object represents an entity in a database.
________________________9. A set of data held in a computer.
_______________________ 10. Grouping of related data.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Learning Assessment Task 2.2.1

Describe and Explain. You need to describe the four data attributes and give at least five
example for each attribute. Then, give explanation why such example belongs to that attribute.

For the description you will be rated with the writing rubric 1-5 or poor to excellent.
For your explanation you will be rated with the writing rubric 1-5 or poor to excellent.

Write your answer here.

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY

_________________________________________

Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.

Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

LESSON 2

DATA VISUALIZATION

Intended Learning Outcome

At the end of this lesson, you will be able to:

1. Visualize data in a 2D scatterplot.

2. Elucidate how data visualization generate useful information.

Now get started

Introduction

With so much information being collected through data analysis in the business
world today, each must have a way to paint a picture of that data so we can interpret it. Data
visualization gives a clear idea of what the information means by giving it visual context through
maps or graphs. Data visualization can help by delivering data in the most efficient way possible.
As one of the essential steps in the business intelligence process, data visualization takes the raw
data, models it, and delivers the data so that conclusions can be reached. In advanced analytics,
data scientists are creating machine learning algorithms to better compile essential data into
visualizations that are easier to understand and interpret.

_______________________________________________________
Keywords
Pixel-oriented visualization, geometric projection visualization
Icon based visualization, hierarchal visualization
_______________________________________________________

Let’s Learn

Data visualization uses visual data to communicate information in a manner that is

universal, fast, and effective. This practice can help companies identify which areas need to be
improved, which factors affect customer satisfaction and dissatisfaction, and what to do with
specific products (where should they go and who should they be sold to). Visualized data gives
stakeholders, business owners, and decision-makers a better prediction of sales volumes and
future growth.

Pixel oriented visualization techniques. The task of the knowledge discovery and data
mining process is to extract knowledge from data such that the resulting knowledge is useful in a
given application. Obviously, only the user can determine whether the resulting knowledge
satisfies this requirement. Moreover, what one user may find useful is not necessarily useful to
another user.

Figure 1. Pixel oriented visualization attributes

Source:
https://www.slideshare.net/phakhwan22/02
-data-41812563

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Geometric Projection Visualization Techniques. A drawback of pixel-oriented visualization

techniques is that they cannot help us much in understanding the distribution of data in a
multidimensional space. For example, they do not show whether there is a dense are in a
multidimensional subspace. Geometric projection techniques help users find interesting
projections of multi-dimensional data sets. The central challenge the geometric projection
techniques try to address is how to visualize a high-dimensional space on a 2-D display.
A scatter plot displays 2-D data points using Cartesian coordinates. A third dimension can
be added using different colours or shapes to represent different data points.

Figure 2. Visualization of 2D data using scatterplot

Source: http://www.industrial-electronics.com/data-mining_2b.html

A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses colour,
it can display up to 4-D data points.

A 3D Scatterplot
Figure 3. Visualization of 3D
scatterplot
Source: http://www.industrial-
electronics.com/data-
mining_2b.html

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

The scatter-plot matrix technique is a useful extension to the scatter plot. For an n
dimensional data set, a scatter-plot matrix is an n × n grid of 2-D scatter plots that provides a
visualization of each dimension with every other dimension.

Figure 4. Visualization of the Iris data set using a scatter-plot matrix. Source:
http://support.sas.com/
documentation/cdl/en/grstatproc/61948/HTML/default/images/gsgscmat.gif

To visualize n-dimensional data points, the parallel coordinates technique draws n equally
spaced axes, one for each dimension, parallel to one of the display axes.

Figure 5. Visualization that uses parallel coordinates. Source: www.stat.columbia.edu/∼cook/

movabletype/archives/2007/10/parallel coordi.thml.
Hierarchal visualization techniques. Hierarchical data is data that can be arranged in the form
of a tree. Each item of data defines a node in the tree, and each node may have a collection of other
nodes as child nodes. The relationship between the parent nodes and the child nodes forms a tree

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

network. The formal definition of a tree is that the graph formed by the nodes and edges (defined
between parent and child node) is both connected and contains no cycles.
The following properties of a tree are of more practical use from the point of view of displaying
visualizations:
 One node, called the root node, has no parent.
 All other nodes have exactly one parent.
 Nodes with no children are termed leaf nodes. Nodes with children are
termed interior nodes.

For all nodes in a tree, there is a single unique path up the tree going from parent to pare
“Worlds-within-Words, “also known as n-vision, is a representative hierarchical visualization
method.

Figure 6. “Worlds-within-Worlds” (also

known as n-Vision). Source:
http://graphics.cs.columbia.edu/
projects/AutoVisual/images/1.dipstick.5.gif

Another example of hierarchical visualization methods, tree-maps display

hierarchical data as a set of nested rectangles. All news stories are organized into seven
categories, each shown in a large rectangle of a unique color. Within each category (i.e., each
rectangle at the top level), the news stories are further partitioned into smaller subcategories.

Figure 7. Newsmap: Use of tree-maps to

visualize Google news headline stories.
Source: www.cs.umd.
edu/class/spring2005/cmsc838s/viz4all/s
s/newsmap.png.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Visualizing complex data and relations. There are many new visualization techniques
dedicated to these kinds of data. For example, many people on the Web tag various objects such
as pictures, blog entries and product reviews. A tag cloud is a visualization of statistics of user-
generated tags. Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order.

Figure 8. Using a tag cloud to visualize popular Web site tags. Source: A snapshot of
www.flickr.com/ photos/tags/, January 23, 2010
Icon-based Visualization Techniques. Use small icons to represent multidimensional data
values. We look at two popular icon-based techniques: Chernoff faces and stick figures.

 Chernoff faces make use of the ability of the human mind to

recognize small differences in facial characteristics and to
assimilate many facial characteristics at once.
 Stick figure visualization technique maps multidimensional
data to five-piece stick figures, where each figure has four
limbs and a body.

Figure 9. Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18).

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

To really understand how we get information thru visualization, let us answer the think in a
minute.

Think in a minute!
Look at the data in the scatterplot. Tell me what you can see.

0.75

0.50
International tourist arrivals

0.25

0.00

-0.25

-0.50
0.0 0.5 1.0 1.5 2.0 2.5
Carbon dioxide emission

This is a data of international tourist arrivals and Carbon dioxide emission of a group
of country.
Write answer here.

“One information can we get from the scatterplot (2D) is that as carbon dioxide increases,
international tourist arrivals is sporadic and later on will drop down to nearly 0% as carbon
dioxide of a given country increases to 100%”..

See, out from the pattern or trend of the points in scatterplot we can generate a useful
information.

Interpreting data in a 2D scatterplot

Please remember that in scatterplot, there are two axis. The X and Y axis. The X axis is the
independent variable and the Y axis is the independent variable. In interpreting the data trend in
scatterplot, you can start reading the data from left to right. In the case above, we interpret the data
by reading from 0.00 Carbon dioxide emission to right 1.00 and so forth.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Take this another example: Weather temperature vs. Cup of coffee

Weather Temperature Cup of coffee
(X) (Y)
5 3
10 7
15 10
20 15
25 18

Here is the scatterplot for the weather temperature vs. cup of coffee
20
18
16
14
Cup of Coffee

12
10
8 Cup of coffee
6
4
2
0
0 5 10 15 20 25 30
Weather temperature

As we can see from the trend in the scatterplot, the cup of coffee increases as the weather
temperature increases.

Remember: You interpret a scatterplot by looking for trends in the data as you go from
left to right:

 If the data show an uphill pattern as you move from left to right, this indicates a positive
relationship between X and Y. As the X-values increase (move right), the Y-values tend to
increase (move up).

 If the data show a downhill pattern as you move from left to right, this indicates a negative
relationship between X and Y. As the X-values increase (move right) the Y-values tend to
decrease (move down).

 If the data don’t seem to resemble any kind of pattern (even a vague one), then no relationship
exists between X and Y.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Let’s sum up

Icon-based Visualization Techniques uses small icons to represent

multidimensional data values.
A tag cloud is a visualization of statistics of user-generated tags. Often, in a tag
cloud, tags are listed alphabetically or in a user-preferred order.
Hierarchical data is data that can be arranged in the form of a tree.
Data visualization uses visual data to communicate information in a manner
that is universal, fast, and effective.

Let’s assess what have you learned in this lesson

Learning Assessment Task 2.2.1

Illustration. You will be given with sets of data and illustrate the data on a 2D scatterplot. You
may draw the scatterplot on the box below. You will be rated based on the correct data points.
Please follow the steps below in placing the data sets in the scatter plot.

Follow these simple steps:

1. First, find the value for x on the x-axis.
2. Next, find the y-value
3. Your point should be plotted at the intersection of x and y.
4. Finally, plot the point on your graph at the appropriate spot.

Table 1. Data sets of Hotel Room supply Vs. Hotel Room demand

Name of Hotel Hotel room demand (Y) Hotel room supply (X)
In % In %
A 10 6
B 9 9
C 5 10
D 9 13
E 7 5
F 12 10
G 6 15
H 4 10
I 2 9
J 15 10

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Draw the scatterplot here.

Learning Assessment Task 2.2.2

Interpretation. Based on your answer in Learning Assessment Task 2.2.1, write your
observation and interpretation on the data sets being plot in the scatterplot. You will be rated
based on the writing rubric with a rating scale of 1-5 (poor-excellent). Write your answer here.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY

_________________________________________

Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.

Learning Assessment Task 2.2.3

Explanation. Based on your experience in plotting the data sets in a 2D scatterplot explain how
useful information can be generated from the data sets. You will be rated based on the writing
rubric with a rating scale of 1-5 (poor-excellent).

Write your answer here.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY

_________________________________________

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

LESSON 3

DATA PREPROCESSING

Intended Learning Outcome

At the end of this lesson, you will be able to:

1. Explain the different steps of data preprocessing.

Now get started

Introduction

Data preprocessing is data mining technique that involves transforming

raw data into an understandable format. Real-world data is often incomplete, inconsistent,
and/or lacking in certain behaviours or trends, and is likely to contain many errors. Data
preprocessing is a proven method of resolving such issues. When using data, most people agree
that your insights and analysis are only as good as the data you are using. Essentially, garbage
data in is garbage analysis out. Data cleaning, also referred to as data cleansing and data
scrubbing, is one of the most important steps for your organization if you want to create a culture
around quality data decision-making.

_______________________________________________________
Keywords
Data cleaning, Data preprocessing
Data reduction, Data transformation
_______________________________________________________

Let’s Learn

To make the process easier, data preprocessing is divided

into four stages: data cleaning, data integration, data reduction,
and data transformation
Data cleaning. It is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data
within a dataset. When combining multiple data sources, there are
many opportunities for data to be duplicated or mislabeled. If data
is incorrect, outcomes and algorithms are unreliable, even though
they may look correct.

Data Integration. It is a data preprocessing technique that

involves combining data from multiple heterogeneous data
sources into a coherent data store and provide a unified view of
the data. These sources may include multiple data cubes,
databases or flat files.
HMPE 201- Data Analytics in the Hospitality Industry
College of Hospitality and Tourism Management

Data reduction. The method of data reduction may achieve a condensed description of the
original data which is much smaller in quantity but keeps the quality of the original data.
Methods of data reduction: These are explained as following below.

1. Data Cube Aggregation. This technique is used to aggregate data in a simpler form. For
example, imagine that information you gathered for your analysis for the years 2012 to 2014,
that data includes the revenue of your company every three months. They involve in the annual
sales, rather than the quarterly average, So it can summarize the data in such a way that the
resulting data summarizes the total sales per year instead of per quarter. It summarizes the
data.
2. Dimension reduction. Whenever it come across any data which is weakly important, then
we use the attribute required for our analysis. It reduces data size as it eliminates outdated or
redundant features.

Step-wise Forward Selection. The selection begins with an empty set of attributes later on
we decide best of the original attributes on the set based on their relevance to other
attributes. We know it as a p-value in statistics.

Step-wise Backward Selection. This selection starts with a set of complete attributes in the
original data and at each point, it eliminates the worst remaining attribute in the set.
Suppose there are the following attributes in the data set in which few attributes are
redundant.
Combination of forwarding and Backward Selection –
It allows us to remove the worst and select best attributes, saving time and making the
process faster.

Data Compression. The data compression technique reduces the size of the files using different
encoding mechanisms (Huffman Encoding & run-length Encoding).

There are two types based on their compression techniques.

1. Lossless Compression. Encoding techniques (Run Length Encoding) allows a simple

and minimal data size reduction. Lossless data compression uses algorithms to restore
the precise original data from the compressed data.
2. Lossy Compression. Methods such as Discrete Wavelet transform technique, PCA
(principal component analysis) are examples of this compression. For e.g., JPEG image
format is a lossy compression, but we can find the meaning equivalent to the original
the image. In lossy-data compression, the decompressed data may differ to the original
data but are useful enough to retrieve information from them.

Numerosity Reduction. In this reduction technique the actual data is replaced with
mathematical models or smaller representation of the data instead of actual data, it is
important to only store the model parameter. Or non-parametric method such as clustering,
histogram, sampling.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Data Transformation. The data are transformed or consolidated so that the resulting mining
process may be more efficient, and the patterns found may be easier to understand. Data
discretization, a form of data transformation.

In data transformation, the data are transformed or consolidated into forms appropriate for
mining. Strategies for data transformation include the following:

1. Smoothing, which works to remove noise from the data. Techniques include binning,
regression, and clustering.

2. Attribute construction (or feature construction), where new attributes are constructed .

Discretization & Concept Hierarchy Operation. Techniques of data discretization are used
to divide the attributes of the continuous nature into data with intervals. We replace many
constant values of the attributes by labels of small intervals.

This means that mining results are shown in a concise, and easily understandable way.

1. Top-down discretization. Consider one or a couple of points (so-called breakpoints or

split points) to divide the whole set of attributes and repeat of this method up to the end,
then the process is known as top-down discretization also known as splitting.
2. Bottom-up discretization. Consider all the constant values as split-points, some are
discarded through a combination of the neighborhood values in the interval, that process
is called bottom-up discretization

Let’s sum up

Techniques of data discretization are used to divide the attributes of the

continuous nature into data with intervals.
The data compression technique reduces the size of the files using different
encoding mechanisms.
Data Integration is a technique that involves combining data from multiple
heterogeneous data sources into a coherent data store and provide a unified
view of the data.
Data cleaning is the process that removes data that does not belong in your
dataset.
Data transformation is the process of converting data from one format or structure
into another.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

Let’s assess what have you learned in this lesson

Learning Assessment Task 2.3.1

Search and Discuss. You are task to search on the internet one hotel establishment in Pasay City
and the number of rooms available for the guests. Moreover, you need also to find out 2020
foreign tourist arrivals in the Philippines, particularly in Pasay City. Then apply the preprocessing
steps of the data that you learn from this lesson to the data sets that you have. Then explain the
following:
 What you have encountered during the process of addressing the requirements?
 How did you apply the steps in pre-processing the data on the data sets that you have
search.
 Why is it important to preprocess the data
 What have you learned in this lesson.
For each question you will be rated with the writing rubric with the scale of 1-5 (poor-excellent).

Write your answer here.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.

HM PE 201 – DATA ANALYTICS IN HOSPITALITY INDUSTRY

_________________________________________

Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.
Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.

HMPE 201- Data Analytics in the Hospitality Industry

College of Hospitality and Tourism Management

References:
1. Rumsey,D. Statistics Workbook For Dummies, Statistics II For Dummies, and Probability For
Dummies.
2. Han, J. et.al. (2012). Data Mining. Concepts and Techniques.Morgan Kaufinnan
Publishers.
3. Kelly A. McGuire (2016).The Analytic Hospitality Executive: Implementing Data
Analytics in Hotels and Casinos
4. Rodrigues, JP., Sousa, MJ. (2020). Systematic literature review on hospitality analytics.
International Journal of Business Intelligence Research. Volume 11, Issue #2.
5. Shereni, N. C., & Chambwe, M. (2019). Hospitality Big Data Analytics in Developing
Countries. Journal of Quality Assurance in Hospitality & Tourism, 21(3), 361–369.
https://doi.org/10.1080/1528008x.2019.1672233
6. Rodrigues, J. P., Sousa, M. J., & Brochado, A. (2020). A Systematic Literature
Review on Hospitality Analytics. International Journal of Business Intelligence
Research, 11(2), 47–55. https://doi.org/10.4018/ijbir.20200701.oa2
7. Gupta, K., Gauba, T., & Jain, S. (2020). Big data in Hospitality Industry: A Survey.
International Research Journal of Engineering and Technology. 11 (4). e-ISSN: 2395-0056

HMPE 201- Data Analytics in the Hospitality Industry

Data Analytics in The Hospitality Industry HMPE 201
No ratings yet
Data Analytics in The Hospitality Industry HMPE 201
23 pages
Introduction To Consumer Behaviour
No ratings yet
Introduction To Consumer Behaviour
28 pages
The Implications of CRM Strategies On Hotel Staff Skills and Competencies in Relations T Oguest Satisfaction in The Case of The Ritz-Carlton Abama, Tenerife PDF
No ratings yet
The Implications of CRM Strategies On Hotel Staff Skills and Competencies in Relations T Oguest Satisfaction in The Case of The Ritz-Carlton Abama, Tenerife PDF
22 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Measuring Customer Relationship Management (CRM) in The Hospitality Industry of Some Selected Hotels in Accra, Ghana: The Role of Information and Communication Technologies (ICTs)
No ratings yet
Measuring Customer Relationship Management (CRM) in The Hospitality Industry of Some Selected Hotels in Accra, Ghana: The Role of Information and Communication Technologies (ICTs)
10 pages
Wk.2 Applid Business Tools Module
No ratings yet
Wk.2 Applid Business Tools Module
10 pages
Supply Chain Management Syllabus Finalzzzz PDF Free
100% (2)
Supply Chain Management Syllabus Finalzzzz PDF Free
14 pages
Entrepreneurship in Tourism and Hospitality
No ratings yet
Entrepreneurship in Tourism and Hospitality
4 pages
Overview of Hospitality Industry
No ratings yet
Overview of Hospitality Industry
18 pages
Module 1 THC 304 Entrep SY 2021 2022 2nd Sem
No ratings yet
Module 1 THC 304 Entrep SY 2021 2022 2nd Sem
22 pages
08 Cybersecurity in The Hospitality Industry
No ratings yet
08 Cybersecurity in The Hospitality Industry
4 pages
HPC 7 & TPC 6 Module
No ratings yet
HPC 7 & TPC 6 Module
42 pages
SCM in Hotel
No ratings yet
SCM in Hotel
7 pages
1 E-Commerce in Hospitality and Tourism Industry
No ratings yet
1 E-Commerce in Hospitality and Tourism Industry
10 pages
Theories of Entrepreneurship
No ratings yet
Theories of Entrepreneurship
6 pages
Balancing Demand & Productive Capacity
67% (3)
Balancing Demand & Productive Capacity
38 pages
BSHM-Micro Perspective in TH
100% (1)
BSHM-Micro Perspective in TH
19 pages
The Advantages of E-Business: by Kristie Lorette
No ratings yet
The Advantages of E-Business: by Kristie Lorette
7 pages
Teaching Entrepreneurship in Tourism and Hospitality Undergraduate Programmes
100% (1)
Teaching Entrepreneurship in Tourism and Hospitality Undergraduate Programmes
8 pages
FAST Fuds
No ratings yet
FAST Fuds
8 pages
Entrepreneurial Marketing
No ratings yet
Entrepreneurial Marketing
3 pages
Effect of Service Quality On Customer Loyalty: A Study of Hotels in Ethiopia
No ratings yet
Effect of Service Quality On Customer Loyalty: A Study of Hotels in Ethiopia
10 pages
Green Practices in Restaurants PDF
No ratings yet
Green Practices in Restaurants PDF
5 pages
Team Executive Summary
No ratings yet
Team Executive Summary
5 pages
Obe-Syllabus-Convention and Events Management
No ratings yet
Obe-Syllabus-Convention and Events Management
8 pages
Applied Business Tools and Technologies
No ratings yet
Applied Business Tools and Technologies
41 pages
Lesson 1 Applied Business Tools and Technologies in Tourism and Hospitality Industry
No ratings yet
Lesson 1 Applied Business Tools and Technologies in Tourism and Hospitality Industry
4 pages
The Impact of People, Process and Physical Evidence On Tourism, Hospitality and Leisure PDF
No ratings yet
The Impact of People, Process and Physical Evidence On Tourism, Hospitality and Leisure PDF
15 pages
Tourism and Hospitality Trends
No ratings yet
Tourism and Hospitality Trends
22 pages
Issues, Challenges and Trends in Hospitality Industry
100% (1)
Issues, Challenges and Trends in Hospitality Industry
6 pages
Atad Patrol
No ratings yet
Atad Patrol
95 pages
Course Objectives: MGT 412 Strategic Management
No ratings yet
Course Objectives: MGT 412 Strategic Management
2 pages
Hotel Management Meaning and Principles
No ratings yet
Hotel Management Meaning and Principles
29 pages
Obe Syllabus Sample
No ratings yet
Obe Syllabus Sample
5 pages
Customer Relationship Management in Tourism Sector
No ratings yet
Customer Relationship Management in Tourism Sector
8 pages
Applied Busines Tools and Technologies
67% (3)
Applied Busines Tools and Technologies
3 pages
Tourism Entreprenuership
No ratings yet
Tourism Entreprenuership
4 pages
Social and Cultural Environment
No ratings yet
Social and Cultural Environment
61 pages
Case Study
100% (1)
Case Study
6 pages
Eastern Samar State University Guiuan Campus Guiuan, Eastern Samar Course Syllabus
No ratings yet
Eastern Samar State University Guiuan Campus Guiuan, Eastern Samar Course Syllabus
4 pages
Tourism Laboratory Manual
No ratings yet
Tourism Laboratory Manual
19 pages
Lyceum of Alabang College of Tourism & Hospitality Management Bachelor of Science in Hotel Restaurant Management
No ratings yet
Lyceum of Alabang College of Tourism & Hospitality Management Bachelor of Science in Hotel Restaurant Management
15 pages
Quality Service Management in Tourism and Hospitality 4
No ratings yet
Quality Service Management in Tourism and Hospitality 4
5 pages
Quinabato, Mariel B.
No ratings yet
Quinabato, Mariel B.
6 pages
Introduction To Quality Service Management in Tourism and Hospitality
No ratings yet
Introduction To Quality Service Management in Tourism and Hospitality
10 pages
Module 2 THC 304 Entrepreneurship SY 2021 2022 2nd Sem
No ratings yet
Module 2 THC 304 Entrepreneurship SY 2021 2022 2nd Sem
9 pages
Fundamentals in Lodging
0% (1)
Fundamentals in Lodging
11 pages
Module 2 Social Factors Promoting Leisure and Recreation
No ratings yet
Module 2 Social Factors Promoting Leisure and Recreation
7 pages
Non Verbal Communication
No ratings yet
Non Verbal Communication
10 pages
Observation in The Experiential Hospitality
No ratings yet
Observation in The Experiential Hospitality
9 pages
Trends and Issues of Hotel Industry in CALABARZON Inputs To Tourism and Hospitality Development
No ratings yet
Trends and Issues of Hotel Industry in CALABARZON Inputs To Tourism and Hospitality Development
9 pages
MIdterm - Breakthrough of Tourism and Hospitality Services
No ratings yet
MIdterm - Breakthrough of Tourism and Hospitality Services
23 pages
Module 3 THC 304 Entrep SY 2021 2022 2nd Sem
No ratings yet
Module 3 THC 304 Entrep SY 2021 2022 2nd Sem
13 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
18 pages
Entrepreneurship Competency Development
100% (2)
Entrepreneurship Competency Development
3 pages
Lesson 2: Strategic Management in Hospitality and Tourism: Learning Objectives
No ratings yet
Lesson 2: Strategic Management in Hospitality and Tourism: Learning Objectives
10 pages
Strategic Management and Total Quality Management
No ratings yet
Strategic Management and Total Quality Management
7 pages
Big Data - Challenges for the Hospitality Industry: 2nd Edition
From Everand
Big Data - Challenges for the Hospitality Industry: 2nd Edition
Michael Toedt
No ratings yet
Data Warehousing and Data Mining CA 1
No ratings yet
Data Warehousing and Data Mining CA 1
8 pages
UNIT- I
No ratings yet
UNIT- I
17 pages
Rundown of Handbrake Settings
No ratings yet
Rundown of Handbrake Settings
11 pages
11 Jpeg
No ratings yet
11 Jpeg
26 pages
Viva Question Cce
No ratings yet
Viva Question Cce
13 pages
hts3271 12 Pss
No ratings yet
hts3271 12 Pss
3 pages
logcat1724289967115
No ratings yet
logcat1724289967115
124 pages
Sensors 23 07408
No ratings yet
Sensors 23 07408
19 pages
Chroma Sub Sampling Notation
No ratings yet
Chroma Sub Sampling Notation
3 pages
CM1030 HCW Final Mar2020
No ratings yet
CM1030 HCW Final Mar2020
6 pages
Assignment-1 Digital Image Processing
No ratings yet
Assignment-1 Digital Image Processing
8 pages
VibPro 022-000031C
No ratings yet
VibPro 022-000031C
670 pages
ChatGPT Is A Blurry JPEG of The Web (-V-) The New Yorker
No ratings yet
ChatGPT Is A Blurry JPEG of The Web (-V-) The New Yorker
18 pages
Encoder Hdmi
No ratings yet
Encoder Hdmi
3 pages
Videophone:: Analog Videophones
No ratings yet
Videophone:: Analog Videophones
6 pages
CPRI Compression (5G RAN2.1 - 02)
50% (2)
CPRI Compression (5G RAN2.1 - 02)
30 pages
h.256 Presentation
No ratings yet
h.256 Presentation
34 pages
Cambridge IGCSE™: Computer Science 0478/13 October/November 2021
No ratings yet
Cambridge IGCSE™: Computer Science 0478/13 October/November 2021
8 pages
How To Do Compression in SAP BW
No ratings yet
How To Do Compression in SAP BW
15 pages
2
No ratings yet
2
4 pages
Image Compression Using Huffman Coding
No ratings yet
Image Compression Using Huffman Coding
25 pages
File Compression With Questions
No ratings yet
File Compression With Questions
11 pages
Somachine: Somachine Motion, Programming Software For Pacdrive 3 Automation Solution
No ratings yet
Somachine: Somachine Motion, Programming Software For Pacdrive 3 Automation Solution
11 pages
DIP Notes
No ratings yet
DIP Notes
22 pages
Honors_EC_AC
No ratings yet
Honors_EC_AC
11 pages
KB Catalog
No ratings yet
KB Catalog
9 pages
Neural Voice Cloning With A Few Samples: February 2018
No ratings yet
Neural Voice Cloning With A Few Samples: February 2018
17 pages
Unit 2 - Speech and Video Processing (SVP) - 1
No ratings yet
Unit 2 - Speech and Video Processing (SVP) - 1
23 pages
Ge Voluson 730 Expert Ultrasound Refurbished
No ratings yet
Ge Voluson 730 Expert Ultrasound Refurbished
10 pages
Course Curriculum and Syllabus For MCA
No ratings yet
Course Curriculum and Syllabus For MCA
50 pages
JBIG2 Compression of Monochrome Images With OCR
No ratings yet
JBIG2 Compression of Monochrome Images With OCR
57 pages
Leo Weighpro New
No ratings yet
Leo Weighpro New
47 pages