HMPE 201 Final Module 2 1
HMPE 201 Final Module 2 1
Course Overview
Module Guide
How to navigate this module
Hi, welcome to this module “The Basics of Data Analytics”. This module discusses the
different preparation of Beverage Products which comprise the following topics:
1. The Data
2. Visualization of data
3. Data pre-processing
Upon reading this module and answering the assessment provided to you, you will be to:
1. Determine various types of data, its characteristic, components, attributes and their
relationship
2. Define what is data visualization
3. Elucidate how data visualization generate useful information thru using various
techniques
4. Explain the different steps of data preprocessing.
All the learnings that you will acquired in this module is significant in completing all the
laboratory activities on the laboratory guide attached in this module.
The module made use of illustrative examples and visualize graphics for you to easily
understand the topics. The references used for this are the research output published on some
reputable research sites, published books and e-books, and learning materials related to Food
and beverage service operations.
LESSON 1
THE DATA
Introduction
Data analytics is the science of analysing raw datasets in order to derive a conclusion
regarding the information they hold. It enables us to discover patterns in the raw data and draw
valuable information from them. Data analytics processes and techniques may use applications
incorporating machine learning algorithms, simulation, and automated systems. The systems and
algorithms work on the unstructured data for human use. These findings are interpreted and used
to help organizations understand their clients better, analyse their promotional campaigns,
customize content, create content strategies, and develop products. Data analytics help
organizations to maximize market efficiency and improve their earnings.
.
_______________________________________________________
Keywords
Database system, data warehouse, Data objects
Data attributes, patterns, association, correlation
_______________________________________________________
Let’s Learn
When this Data has so much importance in our life then it becomes important to properly
store and process this without any error. When dealing with datasets, the category of data plays
an important role to determine which preprocessing strategy would work for a particular set to
get the right results or which type of statistical analysis should be applied for the best results.
Let’s dive into some of the commonly used categories of data.
Database System
set is an assortment of tables, every one of which is allotted an exceptional name. Each table comprises
of a bunch of traits (segments or fields) and for the most part stores an enormous arrangement of
tuples (records or lines). Each tuple in a social table addresses an item distinguished by a special key
and portrayed by a bunch of trait esteems (Han, Kamber & Pei, 2012).
Data Warehouse
A data warehouse is a large collection of business data used to help an organization make
decisions. The concept of the data warehouse has existed since the 1980s, when it was developed
to help transition data from merely powering operations to fuelling decision support systems that
reveal business intelligence. The large amount of data in data warehouses comes from different
places such as internal applications such as marketing, sales, and finance; customer-facing apps;
and external partner systems, among others.
On a technical level, a data warehouse periodically pulls data from those apps and
systems; then, the data goes through formatting and import processes to match the data already
in the warehouse. The data warehouse stores this processed data so it’s ready for decision makers
to access. How frequently data pulls occur, or how data is formatted, etc., will vary depending on
the needs of the organization.
Data sets are made up of data objects. A data object represents an entity—in a sales
database, the objects may be customers, store items, and sales; in a medical database, the objects
may be patients; in a university database, the objects may be students, professors, and courses.
Data objects are typically described by attributes. Data objects can also be referred to as samples,
examples, instances, data points, or objects. If the data objects are stored in a database, they are
data tuples. That is, the rows of a database correspond to the data objects, and the columns
correspond to the attributes. In this section, we define attributes and look at the various attribute
types.
Data Attributes
Nominal Attribute
Nominal means “relating to names.” The values of a nominal attribute are
symbols or names of things. Each value represents some kind of category,
code, or state, and so nominal attributes are also referred to as categorical.
The values do not have any meaningful order.
Ordinal Attribute
An ordinal attribute is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude between
successive values is not known.
Numeric attribute
A numeric attribute is quantitative; that is, it is a measurable quantity,
represented in integer or real values. Numeric attributes can be interval-
scaled or ratio-scaled.
Interval-Scaled Attributes
Interval-scaled attributes are measured on a scale of
equal-size units. The values of interval-scaled attributes
have order and can be positive, 0, or negative. Thus, in
addition to providing a ranking of values, such attributes
allow us to compare and quantify the difference between
values.
Ratio-Scaled Attributes
A ratio-scaled attribute is a numeric attribute with an
inherent zero-point. That is, if a measurement is ratio-scaled,
we can speak of a value as being a multiple (or ratio) of
another value. In addition, the values are ordered, and we can
also compute the difference between values, as well as the
mean, median, and mode.
In organizing data, there are pattern or trend can be formed and drawn out from the
organized data.
Frequent patterns, as the name suggests, are patterns that occur frequently in data. There
are many kinds of frequent patterns, including frequent item sets, frequent subsequence’s (also
known as sequential patterns), and frequent substructures. A frequent item set typically refers to
a set of items that often appear together in a transactional data set—for example, milk and bread,
which are frequently bought together in grocery stores by many customers. A frequently
occurring subsequence, such as the pattern that customers, tend to purchase first a laptop,
followed by a digital camera, and then a memory card, is a (frequent) sequential pattern. A
substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with item sets or subsequences. If a substructure occurs frequently, it is called a
(frequent) structured pattern. Mining frequent patterns leads to the discovery of interesting
associations and correlations within data.
Association of Data
Correlation of data
This means that data moves in coordination with another.
Let’s sum up
Identification. Students will be given with 10 items identification covering all the topics under
this lesson. They will rated based on their correct answer.
Instruction: Identify the answer on the scrambled words in the box. Please write your identified
answer on the spaces provided before the number.
Scrambled Words
Describe and Explain. You need to describe the four data attributes and give at least five
example for each attribute. Then, give explanation why such example belongs to that attribute.
For the description you will be rated with the writing rubric 1-5 or poor to excellent.
For your explanation you will be rated with the writing rubric 1-5 or poor to excellent.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.
Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.
Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.
LESSON 2
DATA VISUALIZATION
With so much information being collected through data analysis in the business
world today, each must have a way to paint a picture of that data so we can interpret it. Data
visualization gives a clear idea of what the information means by giving it visual context through
maps or graphs. Data visualization can help by delivering data in the most efficient way possible.
As one of the essential steps in the business intelligence process, data visualization takes the raw
data, models it, and delivers the data so that conclusions can be reached. In advanced analytics,
data scientists are creating machine learning algorithms to better compile essential data into
visualizations that are easier to understand and interpret.
_______________________________________________________
Keywords
Pixel-oriented visualization, geometric projection visualization
Icon based visualization, hierarchal visualization
_______________________________________________________
Let’s Learn
Pixel oriented visualization techniques. The task of the knowledge discovery and data
mining process is to extract knowledge from data such that the resulting knowledge is useful in a
given application. Obviously, only the user can determine whether the resulting knowledge
satisfies this requirement. Moreover, what one user may find useful is not necessarily useful to
another user.
A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses colour,
it can display up to 4-D data points.
A 3D Scatterplot
Figure 3. Visualization of 3D
scatterplot
Source: http://www.industrial-
electronics.com/data-
mining_2b.html
The scatter-plot matrix technique is a useful extension to the scatter plot. For an n
dimensional data set, a scatter-plot matrix is an n × n grid of 2-D scatter plots that provides a
visualization of each dimension with every other dimension.
Figure 4. Visualization of the Iris data set using a scatter-plot matrix. Source:
http://support.sas.com/
documentation/cdl/en/grstatproc/61948/HTML/default/images/gsgscmat.gif
To visualize n-dimensional data points, the parallel coordinates technique draws n equally
spaced axes, one for each dimension, parallel to one of the display axes.
network. The formal definition of a tree is that the graph formed by the nodes and edges (defined
between parent and child node) is both connected and contains no cycles.
The following properties of a tree are of more practical use from the point of view of displaying
visualizations:
One node, called the root node, has no parent.
All other nodes have exactly one parent.
Nodes with no children are termed leaf nodes. Nodes with children are
termed interior nodes.
For all nodes in a tree, there is a single unique path up the tree going from parent to pare
“Worlds-within-Words, “also known as n-vision, is a representative hierarchical visualization
method.
Visualizing complex data and relations. There are many new visualization techniques
dedicated to these kinds of data. For example, many people on the Web tag various objects such
as pictures, blog entries and product reviews. A tag cloud is a visualization of statistics of user-
generated tags. Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order.
Figure 8. Using a tag cloud to visualize popular Web site tags. Source: A snapshot of
www.flickr.com/ photos/tags/, January 23, 2010
Icon-based Visualization Techniques. Use small icons to represent multidimensional data
values. We look at two popular icon-based techniques: Chernoff faces and stick figures.
Figure 9. Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18).
To really understand how we get information thru visualization, let us answer the think in a
minute.
Think in a minute!
Look at the data in the scatterplot. Tell me what you can see.
0.75
0.50
International tourist arrivals
0.25
0.00
-0.25
-0.50
0.0 0.5 1.0 1.5 2.0 2.5
Carbon dioxide emission
This is a data of international tourist arrivals and Carbon dioxide emission of a group
of country.
Write answer here.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
____________________________________________________________________________________________________________.
Let’s take a look on your answer.
“One information can we get from the scatterplot (2D) is that as carbon dioxide increases,
international tourist arrivals is sporadic and later on will drop down to nearly 0% as carbon
dioxide of a given country increases to 100%”..
See, out from the pattern or trend of the points in scatterplot we can generate a useful
information.
Here is the scatterplot for the weather temperature vs. cup of coffee
20
18
16
14
Cup of Coffee
12
10
8 Cup of coffee
6
4
2
0
0 5 10 15 20 25 30
Weather temperature
As we can see from the trend in the scatterplot, the cup of coffee increases as the weather
temperature increases.
Remember: You interpret a scatterplot by looking for trends in the data as you go from
left to right:
If the data show an uphill pattern as you move from left to right, this indicates a positive
relationship between X and Y. As the X-values increase (move right), the Y-values tend to
increase (move up).
If the data show a downhill pattern as you move from left to right, this indicates a negative
relationship between X and Y. As the X-values increase (move right) the Y-values tend to
decrease (move down).
If the data don’t seem to resemble any kind of pattern (even a vague one), then no relationship
exists between X and Y.
Let’s sum up
Illustration. You will be given with sets of data and illustrate the data on a 2D scatterplot. You
may draw the scatterplot on the box below. You will be rated based on the correct data points.
Please follow the steps below in placing the data sets in the scatter plot.
Table 1. Data sets of Hotel Room supply Vs. Hotel Room demand
Name of Hotel Hotel room demand (Y) Hotel room supply (X)
In % In %
A 10 6
B 9 9
C 5 10
D 9 13
E 7 5
F 12 10
G 6 15
H 4 10
I 2 9
J 15 10
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.
Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.
Explanation. Based on your experience in plotting the data sets in a 2D scatterplot explain how
useful information can be generated from the data sets. You will be rated based on the writing
rubric with a rating scale of 1-5 (poor-excellent).
Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.
Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.
LESSON 3
DATA PREPROCESSING
Introduction
_______________________________________________________
Keywords
Data cleaning, Data preprocessing
Data reduction, Data transformation
_______________________________________________________
Let’s Learn
Data reduction. The method of data reduction may achieve a condensed description of the
original data which is much smaller in quantity but keeps the quality of the original data.
Methods of data reduction: These are explained as following below.
1. Data Cube Aggregation. This technique is used to aggregate data in a simpler form. For
example, imagine that information you gathered for your analysis for the years 2012 to 2014,
that data includes the revenue of your company every three months. They involve in the annual
sales, rather than the quarterly average, So it can summarize the data in such a way that the
resulting data summarizes the total sales per year instead of per quarter. It summarizes the
data.
2. Dimension reduction. Whenever it come across any data which is weakly important, then
we use the attribute required for our analysis. It reduces data size as it eliminates outdated or
redundant features.
Step-wise Forward Selection. The selection begins with an empty set of attributes later on
we decide best of the original attributes on the set based on their relevance to other
attributes. We know it as a p-value in statistics.
Step-wise Backward Selection. This selection starts with a set of complete attributes in the
original data and at each point, it eliminates the worst remaining attribute in the set.
Suppose there are the following attributes in the data set in which few attributes are
redundant.
Combination of forwarding and Backward Selection –
It allows us to remove the worst and select best attributes, saving time and making the
process faster.
Data Compression. The data compression technique reduces the size of the files using different
encoding mechanisms (Huffman Encoding & run-length Encoding).
Numerosity Reduction. In this reduction technique the actual data is replaced with
mathematical models or smaller representation of the data instead of actual data, it is
important to only store the model parameter. Or non-parametric method such as clustering,
histogram, sampling.
Data Transformation. The data are transformed or consolidated so that the resulting mining
process may be more efficient, and the patterns found may be easier to understand. Data
discretization, a form of data transformation.
In data transformation, the data are transformed or consolidated into forms appropriate for
mining. Strategies for data transformation include the following:
1. Smoothing, which works to remove noise from the data. Techniques include binning,
regression, and clustering.
2. Attribute construction (or feature construction), where new attributes are constructed .
Discretization & Concept Hierarchy Operation. Techniques of data discretization are used
to divide the attributes of the continuous nature into data with intervals. We replace many
constant values of the attributes by labels of small intervals.
This means that mining results are shown in a concise, and easily understandable way.
Let’s sum up
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________.
Excellent (5) - The answer demonstrates excellent composition skills including a clear and
thought-provoking ideas, appropriate and effective organization, lively and convincing
supporting materials, effective diction, and sentence skills, and perfect or near-perfect
mechanics including spelling and punctuation. The writing perfectly accomplishes the
objectives of the task.
Good (4) - The answer contains strong composition skills including a clear and thought-
provoking ideas, although development, diction, and sentence-style may suffer minor flaws.
Shows careful and acceptable use of mechanics. The writing effectively accomplishes the goals
of the task.
Average (3) - The answer demonstrates competent composition skills including adequate
development and organization, although the development of ideas may be trite, assumptions
may be unsupported in more than one area and the diction and syntax may not be clear and
effective. Minimally accomplishes the goals of the task.
Fair (2) - The answer demonstrates composition skills may be flawed in either the clarity of
the ideas, the development, or the organization. Diction, syntax, and mechanics may seriously
affect clarity. Minimally accomplishes the majority of the goals of the task.
Poor (1) - Composition skills may be flawed in two or more areas. Diction, syntax, and
mechanics are excessively flawed. Fails to accomplish the goals of the task.
References:
1. Rumsey,D. Statistics Workbook For Dummies, Statistics II For Dummies, and Probability For
Dummies.
2. Han, J. et.al. (2012). Data Mining. Concepts and Techniques.Morgan Kaufinnan
Publishers.
3. Kelly A. McGuire (2016).The Analytic Hospitality Executive: Implementing Data
Analytics in Hotels and Casinos
4. Rodrigues, JP., Sousa, MJ. (2020). Systematic literature review on hospitality analytics.
International Journal of Business Intelligence Research. Volume 11, Issue #2.
5. Shereni, N. C., & Chambwe, M. (2019). Hospitality Big Data Analytics in Developing
Countries. Journal of Quality Assurance in Hospitality & Tourism, 21(3), 361–369.
https://doi.org/10.1080/1528008x.2019.1672233
6. Rodrigues, J. P., Sousa, M. J., & Brochado, A. (2020). A Systematic Literature
Review on Hospitality Analytics. International Journal of Business Intelligence
Research, 11(2), 47–55. https://doi.org/10.4018/ijbir.20200701.oa2
7. Gupta, K., Gauba, T., & Jain, S. (2020). Big data in Hospitality Industry: A Survey.
International Research Journal of Engineering and Technology. 11 (4). e-ISSN: 2395-0056