0% found this document useful (0 votes)
50 views

Beginners Guide For Business and Science

Uploaded by

mnutarelli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Beginners Guide For Business and Science

Uploaded by

mnutarelli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

Data Analytics

Be ginne rs Guide fo r Busine ss and


Sc ie nc e

By Charles Jensen

Copyright @ 2017
All rights reserved. No part of this book
may be reproduced in any form or by any
means without permission in writing
from the publisher, Charles Jensen.

If you like my book, please leave a


positive review on Amazon. I would
appreciate it a lot. Thanks! The link:
Leave your review here. Thank you!

Look at these other books too:


Ebook Marketing
Make Money Online Fast
Self-Publishing for Beginners
Contents:
Introduction
Chapter 1: Fundamentals of Data
Analytics
Chapter 2: Benefits and Risks of Big
Data
Chapter 3: Data Analytics Algorithms
and Techniques
Chapter 4: Regression Analysis
Chapter 5: Sentiment Analysis
Chapter 6: Decision Tree Analysis
Chapter 7: Social Network Analysis
Chapter 8: Business Intelligence
Chapter 9: Best Practices
Chapter 10: Software
Recommendations
Conclusion
Introduction

In this book, I am going to tell you all


about the basics of data analytics.
Data analytics is an evolving branch of
science and mathematics. The amount of
sheer data available out there is steadily
increasing. What can you do with it?
The problem is several people want to
learn how to leverage data to their
advantage. Sadly, there is a scarcity of
simplified books about data science.
The author intends to contribute
practical understanding of data analytics
to its users. This book gives a right
content for beginners and advanced
readers alike.
This book summarizes the author’s
experiences and researches done on data
analytics.
In this age of information overload, you
can use data to your advantage. By
reading this book, you will find the
essential information on the application
of data analytics in life. Its possibilities
are limitless.
You will never see data the same way
again. The results you can achieve shall
be extraordinary.
Do not wait for others to know first the
benefits of data analytics when you can
buy and read this book now!
What are you waiting? Your success lies
in the following pages.
Chapter 1: Fundamentals of
Data Analytics

At the end of this section, you are


expected to:
Know why you should learn data
analytics;
Point out the extended definition of
data analytics;
Comprehend the history of data
analytics;
Differentiate the different jobs in the
field of data analytics;
Understand the applications of data
analytics in real life.
1.1 Why Do You Need to Learn
Data Analytics?
Data is everywhere, from the corporate
down to the academic world, data can
be seen.
In fact, data accumulation leads to the
formation of observation of trends. It can
help people from corporations down to
individual businesses by analyzing these
patterns.
Job opportunities in the field of data
science have been growing over the
years. Organizations now consider data
analytics to be a vital part of their
businesses. Data analytics drives the
decision-making process of the
companies.
Analysts also noted of the increasing
salaries related to data science. The
author shall discuss later the available
job positions.
1.2 What Is Data Analytics?
A branch of data science, data analytics
concerns the processes used to boost
productivity and business profit. It taps
various qualitative and quantitative
techniques to analyze data trends.
Data analysts draw data patterns from
raw information. Then, they use the
insights to influence the decision-making
and to recommend further action. They
apply different algorithms to generate
more ideas.
Descriptive analytics investigates
historical data to clarify past success
and failures while predictive analytics
forecasts using given historical data.
Moreover, diagnostic analytics points
out the reason while prescriptive
analytics suggest further action.
1.3 History of Data Analytics
Statistics is the foundation of data
analytics. It originated way back in
ancient Egypt when they utilized census
in the construction of the pyramids. Old
governments also used census in
taxation.
1.3.1 Progress in Census
Last 1890, Herman Hollerith made
census efficient by his invention of the
Tabulating Machine. Prior to the
develoption of the device, the
government of the United States took
seven years to poll the whole country.
With the invention, they were able to
assess the population in a span of 18
months. If you compute the difference,
the new process is 467% faster.

1.3.2 Integration of Analytics With


Computer Architecture
In the 1940s, government agencies used
linear programming to conduct computer
simulations to anticipate the behavior of
nuclear chain reactions. The process
used primordial computers. In the same
period, Enrico Fermi and Stan Ulam
used Monte Carlo simulations. John von
Neumann designed the von Neumann
architecture in 1945, a computer system
design.
Ulam also collaborated with von
Neumann to create algorithms such as
importance and rejection sampling.
Tom Davenport and his small teams
defined the evolution of analytics, with
their work, “Analytics 1.0”. Structured
internal data analysis was born. An
example of this is the use of
spreadsheets for analyzing data.

1.3.3 Industrialization of Analytics


John W. Tukey issued “The Future of
Data Analysis” last 1962. In the said
work, he added that data analysis must
apply the characteristics of science
rather than mathematics.
Similarly, corporations and research
institutions used computers in decision
systems and heuristic problem-solving.
Through analytics, the “shortest path
problem” can be easily resolved,
improving air travel. Now, modern-day
Global Positioning System (GPS)
applications use the said breakthrough.
Also, the use of predictive modeling by
Fair Isaac Corporation (FICO) in 1958
led to credit risk decisions.
1.3.4 Widespread Adoption of Analytics
Mid-sized businesses adopted data
analytics since 1970. Relational
databases (RDB) and structured query
language (SQL) also resulted from the
invention of von Neumann architecture.
The International Association for
Statistical Computing (IASC) along with
the SAS Institution established
themselves to advance the integration of
data analytics with computer science. Its
purpose is to link the traditional
methodology of statistics with modern
computing innovations.
In 1989, Howard Dresner devised the
term Business Intelligence (BI). It
concerns for an improved business
decision through evaluating stored data
in companies. Data mining also first
appeared the year afterward. It refers to
the discovery of patterns in a given large
data.
FICO also deployed real-time analytics
to prevent credit card fraud in 1992. The
following year, Ross Ihaka and Robert
Gentleman developed R, an open source
programming language designed for
statistical purposes.
Last 1998, Google harnessed algorithms
in web searches to boost search
relevance.
As time passed, raw data analyzed by
data analytics kept getting bigger and
capable of reaching Big Data. Big data
creates 2.5 quintillion bytes of data
every day.
Now, digital assistants, social media
sites, and web pages use analytics to
process natural language.

1.4 Job Opportunities


There are numerous professions in data
science. Three positions are commonly
available: data scientists, engineers, and
analysts.
Harvard Business Review termed Data
Scientists as the “Sexiest Job of the 21st
Century.” Data scientists are also known
as statisticians and data managers. In the
United States, the average salary is
$115,000.
They are pretty versatile and they have a
role a play in various aspects of the
organization. They assist in predictive
modeling processes and interpret the
findings. They need to have skills in
mathematics, programming, and
communication.
Also known as data architects and
database administrators, data engineers
use software engineering and statistical
algorithms to process massive datasets.
They code and clean up data sets and
implement the orders of the data
scientists. They need to be
knowledgeable in programming and big
data. As of 2017, their average salary is
$100,000.
Business analysts, also known as data
analysts, assists the company to
understand the charts and the data
presented. They are expected to be
experts in statistics and communication.
Their average salary is $65,000.

1.5 Data Analytics in Real Life


Bellow are some of the practical
applications of data analytics.
1.5.1 Information Technology
Search engines utilize data analytics to
predict search results and recommend
relevant information that the user can
benefit.
Artificial intelligence takes advantage of
machine learning to recognize speech
patterns. Weather apps also
crowdsource its information from big
data.
1.5.2 Businesses and Industries
These sectors use predictive analytics
and big data to automate the pricing and
the delivery of their products. They also
use the technologies to advertise their
services, target their consumers, and
suggest recommendations. In banks, they
apply big data for the security of their
clients.
Through Big data, businesses can
customize their ads to reach targeted
consumers. Furthermore, the Securities
Exchange Commission (SEC) uses big
data to monitor illegal trading activity in
the market.
Data analytics empower the scientific
community. They check the
environmental condition using real-time
sensors and processes the data.
Instantaneous updates to weather
forecast would not be possible without
data.
1.5.3 Education and Healthcare
Automation of grading systems allows
school administrators to track the
learning progress of its students. The
teachers can quickly follow the lessons
in which the learners are having a
difficulty.
Wearables such as fitness trackers, tap
into big data to improve the sleep
patterns of its users. Hospitals as well,
utilize big data to serve its patients
efficiently. Moreover, analytics allow
doctors to predict the outbreaks of
epidemics across the world.
1.5.4 Entertainment and
Communications
Social media apps track the usage
pattern of its users. Consequently, the
tracking allows the apps to relay proper
information. Face recognition algorithms
allow the apps to create the tag feature.
Similarly, video streaming sites monitor
the browsing history of its consumers, to
convey video suggestions. On-demand
music services leverage big data to give
music recommendations to its users.
Developers equip electronic games with
machine learning algorithms to improve
the gaming experience.
Cities using analytics can control the
flow of traffic, and they can respond to
emergencies faster.
Chapter 2: Benefits and Risks
of Big Data

At the end of this chapter, you should be


able to:
Comprehend the definition of big
data and its related terms;
Discover the distinct characteristics
of big data;
State the benefits of big data;
Understand the risks involved in
big data;
Identify the future possibilities of
big data.
2.1 What Is Big Data?
John Mashey, a computer scientist,
coined the term last 1990. In 2000,
Francis Diebold defined big data as “the
explosion in the quantity and quality of
available and potentially relevant data.”
Big data is a result of an increase in
processing power, storage, and
networking capacities.
Doug Laney said in 2001 that there are
three key features of big data:
Variety
It refers to the diverse types of
generated data. It has two kinds:
structured and unstructured data.

Volume
It refers to the size of data. The range
lies from terabytes up to yottabytes.

Velocity
It refers to the speed of data. The
scope ranges from yearly up to real
time.

Data scientists consider veracity as a


new key feature. Veracity refers to the
accuracy of data.
This tremendous amount of data needs a
new set of applications to manage the
data efficiently. If not, data fragmentation
occurs.
Big Data Architecture
The author shall explain the underlying
framework behind big data for further
understanding. There are five layers for
setting up functional big data analytics.
Layer 0
Redundant physical infrastructure refers
to the hardware used in distributed
computing. Distributed computing means
the sharing of resources across
computers in a given network. You must
consider the following characteristics:
performance, availability, scalability,
flexibility, and cost of your
infrastructure.
Layer 1
Security infrastructure sets the data’s
safety. You must use high-grade
encryption techniques to protect the data
integrity. You may also use other security
measures to detect possible threats that
may result in data loss. You should also
limit the access of data and applications
in a data infrastructure.
Layer 2
Operational databases employ
structured, unstructured, and semi-
structured data. The acronym, ACID
describes the database behavior.
A stands for ‘atomicity’. It means that if
one instance fails, everything fails.
T stands for consistency. The
transactions must follow the standard.
“I” means isolation. The operations limit
their interaction with one another.
Finally, D stands for durability. Data
stored shall remain forever.
Layer 3
Organizing data services and tools
compile big data to a set. One must
apply the technologies of a distributed
file system, serialization and
coordination services, ETL tools, and
workflow services into the
infrastructure.
Distributed file system refers to the need
for data accommodation while
serialization services refer to the need
for a reliable data storage and
multilanguage remote procedure calls.
Layer 4
Analytical data warehouses consolidate
the data gathered from relational and
other databases. The source can be any
storage medium. Analytics starts from
here.
2.1.1 What Is Structured Data?
Structured data is any data that has a
predetermined format. Structured data
has the following characteristics:
Defined Length
Defined Format

An individual can group and sort out the


pieces of information. One can also
organize the data quickly. Here are some
examples: the use of relational databases
like SQL and Access.
The database is an organized collection
of data. There are two kinds of
databases. If a table contains all
information, it is a flat-file database. If
one or more tables links with another
table in the database, it is called a
relational database.
Schema classifies the tables, fields, and
the relationship between the two.
Database managers use SQL in creating,
retrieving, updating, and deleting data in
a relational database.
Either a machine or a human generates
structured data. Examples of machine-
generated include the following: point-
of-sale data, financial information, and
sensor data. On the other hand, are the
examples of human-generated data:
click-stream data and input data.
2.1.2 What Is Unstructured Data?
Unstructured data is any data that has no
predetermined format. In fact, 80% of
big data is unstructured.
An individual cannot organize
unstructured data efficiently. Thus, data
analysis could be complicated.
Here are some examples: photos,
videos, social media posts, emails,
Word, and PowerPoint documents, and
more. Like structured data, an individual
can categorize unstructured data into two
categories.

2.1.3 What Is Semi-structured Data?


Semi-structured data does not fit into
data tables and relational databases.
However, the data under this type
contains tags and attributes. This type is
also known as self-describing data.
Some of its examples are Extensible
Markup Language (XML) and JavaScript
Object Notation (JSON).

2.2 What Are the Benefits of Big


Data?
As you have read in the preceding
chapter, big data have different
applications in real life. Similarly, big
data for businesses serve different
benefits.
Big data act as a catalyst in better
decision-making across the organization.
Through big data, a company can
perform risk analysis. For example,
supply managers may implement demand
forecasting and supply planning to
maximize resources.
Big data lead businesses to reach
operational efficiency. Production
managers may perform virtual model
production to enable process
transparency, and they may implement
operations analytics using sensors at the
assembly line to improve overall
throughput. Consequently, quality testers
quickly detect defective products.
Big data allow businesses to venture on
new revenue opportunities. Companies
leverage large datasets to achieve higher
levels of productivity. For instance, in
research and development, data
engineers build product design
databases to enable simulation and
experimentation of new products.
Big data enable businesses to improve
their customer services. The collection
of consumers’ feedback allows the
company to change their products
accordingly.
2.3 What Are the Risks of Big Data?
Organizations find it hard to contact
individuals with the skill to efficiently
handle enormous amounts of data.
Big data could mislead the company if
the data analyst did not conduct a proper
interpretation and analysis of the trend.
Big data could damage the company’s
security. The larger the data, the greater
the potential for data loss. The
connection of significant data to the
internet poses a threat for it to be
exposed.
Big data could compromise the
confidentiality of the consumers. Certain
laws apply severe consequences to
breaches of privacy. The company takes
full responsibility for keeping the
customer’s information safe.
Big data could cost the business’ money.
The processes involved in big data
could be expensive.
2.4 What are the future possibilities of
Big Data?
Everyone could seize the tremendous
opportunities big data holds for the
future. The author enumerated below
some of the scenarios that might take
place soon.
Big data could evolve into fast data.
Human intervention would cease in
interpreting data and arriving at
meaningful conclusions. Big data will
still grow exponentially.
Big data could utilize better algorithms
to manage data efficiently. This trend
results in the rise of the usage of
prescriptive analytics. Not only
businesses but also science-related
institutions can take advantage of the
trend.
Big data could reach its scope to the
entire industry. New information could
pace at real-time speeds. This trend
could open the way for collaboration in
the sectors.
Big data could empower the Internet of
Things (IoT). IoT refers to the
integration and communication of
business and consumer technologies. IoT
enables people to change the way they
manage things.
Chapter 3: Data Analytics
Algorithms and Techniques
At the end of this section, you are
expected to:
Know the different data processes;
Describe the terms related to data
algorithms;
Determine the various methods
applied in data analytics;
Comprehend the multiple techniques
used in data analytics;

Big data undergo a variety of processes.


You can classify the two main processes,
namely the data management and data
analytics. The author listed below data
along with its related topics.
Data management concern the
following processes:
• Data Storage (information
theory)
• Data Cleansing and
Formatting (signal processing)
• Data Understanding
(speech recognition, natural
language processing)
• Data Integration (data
warehousing)
• Data Access (databases,
information retrieval)
Data analytics concern the following
processes:
• Data Analysis (machine
learning and data mining)
• Data Visualization and
Interpretation (human-computer
interaction)
In this book, it shall focus its scope on
understanding data analysis and
visualization.
To take advantage of big data, one must
know how to engage in data mining.

3.1 Data Processes


3.1.1 What Is Data Storage?
The process stores data in
electromagnetic paraphernalia such as
hard drive and random-access memory.
Data storage has been continuous
evolving over time.
3.1.2 What Is Data Cleansing and
Formatting?
The process removes unnecessary data
in a database to enhance the data quality.
The following are the common data
quality problems: redundancy,
inconsistent data, naming conflicts and
data integrity.

3.1.3 What Is Data Understanding?


Speech recognition utilizes the signal
part of the speech, extracts a portion of it
and combines words together into
meaningful patterns. Natural language
processing understands the meaning of
the speech recognized.
3.1.4 What Is Data Warehousing and
Integration?
Data warehouses use the Extract,
Transform, Load (ETL) processes.
It refers to the extraction of information
from various sources and integrates them
in an organized manner. Data engineers
loaded extracted data to an ETL server
and sent to a data storage server.
Data engineers integrate data to create a
repository. If a company use data
warehouses, data scientists could mine
data efficiently.

3.1.5 What Is Data Mining?


Data mining uses statistical procedures
to uncover unique patterns to large
datasets. For instance, a diverse
collection of data can result in new
pieces of information. One of its
purposes is to explain the past and
predict the future.
Data Mining combines statistics,
machine learning, and databases to
gather valuable knowledge.
Statistics refers to the values obtained
from a sample data. Data mining has a
direct correlation with statistics.
Additionally, data analysts use
hypothesis testing to check the validity
of the data mining methods.
Some people consider the term,
knowledge discovery from data (KDD)
synonymous to data mining. The process
of data mining is identical to the method
of gold mining in real life. In data
mining, the product is new knowledge.

3.1.6 Concepts in Data Mining


Data mining uses a variety of terms
throughout its process. Bagging refers to
averaging and voting for classification
of continuous variables. Boosting refers
to generating multiple classifiers to
determine the best prediction or
classification.
Predictive data mining requires strict
data preparation to avoid misleading
forecasts. Big data needs data reduction
to aggregate the information available.
Text mining refers to the detection of
patterns in textual form. Unlike numeric
mining, data scientists find it difficult to
decipher the trends among text
documents.
The industry utilizes different models for
data mining. CRISP (Cross-Industry
Standard Process for data mining), Six
Sigma (DMAIC steps), and SEMA
models focus on the technical aspects
relevant to data mining.

3.2 Different Methods Used in Data


Analytics
Data visualization comes after data
analysis. It offers another perspective on
the database. You can visualize small or
big data in a variety of ways.
3.2.1 Basic Visualizations
As a start, you can use tables to organize
your data. Through the usage of a graph,
one can see the data in a symbolic way.
It represents the relationships between
the corresponding variable values.
Figures present numerical data only.
There are a variety of graphs available.
Bar graphs use horizontal or vertical
bars of different lengths to compare two
or more quantities.
Additionally, line graphs compare the
two variables and may show the data
trend. It signifies the data increase or
decrease and allows someone to
forecast the future results.
3.2.2 Data Transformation
Say one would like to extract more
insights out of a visualization aid. You
can use at least three data
transformations.
Data Zooming is looking at a particular
detail in the visualization.
Data Filtering means to remove
temporarily from sight the information
not needed for analysis.
Data Outlier Removal implies to get rid
of the elements that do not represent the
dataset.

3.2.2 More Visualizations


This book touches only the essential
concepts used in data analytics.
In the next chapters, the author shall
discuss in detail the following:
regression analysis, sentiment analysis,
decision tree analysis, social network
analysis, and business intelligence.
Algorithms refer to the procedures for
solving a dilemma. The author
summarized the different methods for
visualization below.

3.2.2 Regression Analysis


It is involved in data prediction. It has
two types: linear regression and
polynomial regression.
In Chapter 10, the author shall discuss
the software packages designed to solve
regression problems and other
challenges.

3.2.3 Sentiment Analysis


User feedbacks can either be positive or
negative. With this, Businesses use
sentiment analysis to improve their
products and services. Moreover,
politics and law-making bodies found
sentiment analysis to be beneficial.

3.2.4 Decision Tree Analysis


Decision Tree Analysis uses logical
algorithms to help its users classify data
and make informed decisions afterward.
It can also fill in lacking information.
One can use either ID3 or C.45
algorithms.
3.2.5 Social Network Analysis
Sociologist uses social network analysis
to gain helpful insights from
interpersonal relationships. It has two
types: content-based and structure-
based. Content-based deals with the
user-posted information while structure-
based deals with the relationships
between participants.

3.2.6 Business Intelligence


Executive heads use business
intelligence to report, analyze, monitor,
and predict the data input and output that
circulates the company. Data mining is
the foundation of business intelligence.
Business intelligence is a combination of
the different analyses mentioned above.
Chapter 4: Regression
Analysis
At the end of this chapter, you should be
able to:
Discover the definition of regression
analysis;
Know the purpose of regression
analysis;
Understand the underlying process of
regression analysis;
Comprehend the terms related to
regression analysis;
Point the relationship between
correlation and regression.
It is important to select a method and
determine the association between
variables. Your choice can affect the
analysis significantly.

4.1 What is Regression Analysis?


If two variables are correlated, one can
predict the value of the dependent
variable given that the independent
variable is known. The procedure
mentioned is called regression analysis.
The dependent variable is the factor that
changes. It is the factor that one is trying
to forecast. On the other hand, the
independent variable is the factor that
affects the dependent variable.
Statisticians link regression analysis
with correlation analysis. Regression
analysis deals to determine the
regression line. They use the line for
data prediction. The said line is also
known as the following: line predictor,
trend line, and best-fit line.
The equation of the regression line
resembles algebra’s “point-slope form.”
However, it can only predict the
probable outcomes of the dependent
variable.

4.1.1 Correlation Analysis


Bivariate data indicates the association
of two variables from a sample or
population. Correlation analysis is the
procedure used to check whether a
relationship exists between two
variables.
Analysts use scatterplots to visualize the
relationship between two variables
quickly.
In a scatterplot, you can only draw a
trend line if there is a correlation
between the bivariate data. Take note
that the best-fit line is the line nearest to
the points plotted in the scatterplot.

TYPES OF REGRESSION
Regressions span from simple equations
to complex ones.

4.1.2 Multiple Regression Analysis


You can use this type to predict the
unknown value of two or more
dependent variables, also known as
predictors.
Statisticians use F-test to determine the
appropriateness of the multiple
regression models. Same for linear
analysis, it uses the correlation
coefficient to determine the accuracy of
the regression.

4.1.3 Logistic Regression Analysis


You can use this to determine the
probability of success and failure.
Classification problems utilize this type.
There are two kinds: ordinal and
multinomial.
4.1.4 Polynomial Regression Analysis
You can use this one if the IV’s degree is
greater than one. Its graph is similar to
the linear regression. IV refers to
Independent Variable.
4.1.5 Stepwise Regression Analysis
You can use this one if you work with
several IVs.
4.1.6 Linear Regression Analysis
You can use this one if the dependent
variable is continuous.
In this book, the author focuses on
discussing linear regression.

Pearson’s correlation coefficient


The Pearson Product-Moment
correlation coefficient ( ) ranges
between -1 and +1. It is also a form of
statistical measure that concerns with the
interval of the variables.
The qualitative descriptions are the
following: perfect (1), very high (0.75 to
1), moderately high (0.50 to 0.74),
moderately low (0.25 to 0.49), very low
(0.01 to 0.24), and no correlation (0).
Correlation has different directions. It
may be positive, negative, or zero.
Perfect correlation is impossible to
achieve in real life. However, this does
not apply if you can control the
variables.

Spearman’s rho
It is a form of statistical measure that
correlates the ranking of the variables.
For example, beauty pageants use this
standard to verify the correlation of the
judges’ contestants rank.
The formula below is used to compute
for the rank correlation. The coefficient,
, means the Greek letter for r.
= difference between ranks; =
number of categories
Statisticians use the same qualitative
descriptions like the one in interpreting
r.

4.2 Regression Analysis Process


4.2.1 Manual Process
The method uses the equation,
to find the value of Y, when
the variable X is known. We cannot
solve for the value of X, in terms of Y
unless the correlation is perfect (1).
Determine the value of the
correlation coefficient r, using the
formula below.

Conduct significance testing. If r is


significant, then proceed to step 3,
otherwise, stop.
Solve for the values of a and b using
the following formulas:
Key in the values of a and b in the
regression equation,

4.2.2 Automated Process


The author shall discuss the automation
of this process using software tools in
Chapter 10.
4.3 What Type of Regression Should
You Use?
Both linear and logistic regression are
the common types used in the industry.
Simply put, if the dependent variable is
continuous, then use the first one. If it is
binary, then use logistic.
First, you must have a background
knowledge on the data exploration part
of data analytics. Second, you may
utilize statistical tools such as r-square
and related ones to check for the biases
in the model. You should use the model
with the lowest error. Lastly, it all
depends on the objective of the analysis.
4.4 Application of Regression Analysis
Most organizations use regression
analysis for prediction and verification.
For instance, businesses can forecast
their net profit for the coming years if
they know the past monthly net profit of
their company.
In the census, the government can predict
the future population of the country
based on the existing population for the
past years.
Economists utilize regression analysis in
drawing the correlation between the
price of goods and the demand.
Chapter 5: Sentiment
Analysis
At the end of this section, you are
expected to:
Unlock the definition of sentiment
analysis;
Describe the terms related to
sentiment analysis;
Know the process of sentiment
analysis;
Understand the need to use sentiment
analysis;
Comprehend the importance of
sentiment analysis.
According to a survey, last 2013, 53%
of people suggest companies and
products on social media platform sites,
such as Twitter. For businesses to
leverage that data, they need to conduct
sentiment analysis. As a result, they
understand their consumers better.

5.1 What is Sentiment Analysis?


This is the collection and study of
subjective data, which is why it’s also
called also known as opinion mining.
Most of the time, sentiment analysis
gathers its data through social media. It
also uses natural language processing
(NLP) and contextual understanding to
sort out raw data.
This method became popular with the
rise of Web 2.0. Since most users can
use the internet, mass participation
allows sentiment analysis to achieve
information sharing. User engagement
feature of Web 2.0 enabled sentiment
analysis to gather subjective data in the
form of the users’ comments, reviews,
and evaluation.
Unlike other methodologies in data
analytics, sentiment analysis only
focuses on few categories namely:
polarity identificationand the consumer’s
rating. Polarity identification means the
process of classifying a statement’s
positivity or negativity.
5.2 Challenges in Sentiment
Analysis
The software solutions designed for
sentiment analysis must improve on the
following areas:
- Software must recognize
rhetorical devices such as irony and
sarcasm;
- Applications should distinguish
similar names;
- Programs must detect spam
ratings;
- Apps must comprehend acronyms
and its related terms.

5.3 The Sentiment Analysis Process


Researchers developed sentiment
analysis from various methods. Some
examples are word lists from
dictionaries and word networking. Few
grammarians have expressed their
difficulty in sentiment analysis.
5.3.1 Word Lists
You categorize the words into their
respective parts of speech; then you
place the definition along with its
synonyms. If the term has multiple
meaning, then the definitions are ranked
according to its popularity.

5.3.2 Word Networking


You emphasize the relationships
between concepts. Then, you group
synonyms together. The algorithm works
like this: first, the list of adjectives
related to the term is linked. Then, one
evaluates the attributes for its synonymy
and antonymy.
ALGORITHMS
Listed below are the techniques used for
automated sentiment analysis.

5.3.3 Unsupervised Learning using PMI-IR


Method
PMI-IR stands for Pointwise Mutual
Information (PMI) on data gathered
using Information Retrieval (IR)
techniques. This algorithm needs to
analyze the textual information based on
their part-of-speech (POS) tags. It
consists of three steps.
First, it uses Penn’s Treebank POS tags
and classifies the text based on its tag.
Second, it computes for the PMI. The
formula:
There are some guidelines one needs to
follow for extracting the two words
phrases in the computation. The
sentiment orientation is calculated using
the formula below.
5.3.4 Use of Lexicons and Natural Language
Processing
The collective work of different
researchers led to the creation of many
sentiment lexicons. Some are available
publicly such as the General Inquirer
lexicon, Sentiment Lexicon, MPQA
subjectivity lexicon, SentiWordNet, and
Emotion Lexicon.
It does not mean that if Lexicon includes
a word, it expresses an opinion right
away. The basis is still the context of the
word in a sentence.
Natural Language Processing processes
texts through the four analyses described
below.
Lexicons can lead to lexical
(morphological) analysis. It explores the
characteristics of an individual word.
The data gathered shall be utilized in the
syntactic analysis.
The syntactic analysis identifies the
grammatical structure and creates
meaning out of it. It is like semantic
analysis, which reveals the meaning of
the whole sentence. Finally, discourse-
level determines the entire idea of the
text.

5.3.5 Aspect-based Opinion Summary


Search engines utilize this method to
rank products in their search results. It
has two parts: forming an overview of
the meaning of the views and tallying the
number of similar ideas.
For example:
PRDOUCT X:
Aspect: Product Quality
Positive: 155
Negative: 6

5.3.6 Comparative Opinions Analysis

Analyzing these type of sentences is


different from the traditional ones.
Software must be designed to at least
recognize the various sentence
construction.

There are two types of comparisons:


gradable and non-gradable comparison.
Under gradable are three sub-types:

Non-equal gradable comparison: It


uses the words greater or less than
or other synonyms.
Equative comparison: It uses the
words equal to or other related
words.
Superlative comparison: It uses the
words all along with greater or less
than or similar phrases.

Under non-gradable are three sub-types.


The first type compares the similarity or
difference between two entities on their
common aspects. The second category
compares two objects on a single
characteristic. The third group compares
two objects, but only one entity has the
desired attribute.
5.4 Applications of Sentiment Analysis
Opinions affect human behavior.
Companies could use opinion mining to
influence their business decisions.
Online retail stores use sentiment
analysis to gain insights from the
customer ratings and reviews.
Government intelligence tracks various
internet sources to prevent terror threats.
Political parties could be aware of the
public opinion through sentiment
analysis as well.
Businesses could also enhance its
customer experience through sentiment
analysis. The consumer’s suggestions
and complaints could be heard and
sorted out better. Instead of hiring a
clerk to track the posts in the social
media, companies can automate the
entire process.
Chapter 6: Decision Tree
Analysis
At the end of this chapter, you should be
able to:
Grasp the meaning of decision tree
analysis;
Point out the differ function of
decision tree analysis;
Understand the process of decision
tree analysis;
Know the purpose of decision tree
analysis.
6.1 Why Use Decision Tree Analysis?
Decision Tree Analysis aids businesses
an efficient method of decision making,
especially in finances. It allows firms to
conduct multiple variable analyses.
Thus, one can reveal the consequences
of a choice.
The diagrams offer a clear, visual
representation of the feasible options.
Additionally, it shows the relationships
among the input values.
Decision trees can handle datasets that
may have errors and missing values. It
can also complement with other data
analysis tools.

6.2 Basic Parts and Terminologies


Three nodes compose decision trees.
Decision nodes depict the options
available for the organization to choose.
Squares stand for the decision nodes. A
line connects the decision node to the
other nodes.
Circles represent the chance (also
known as an event) nodes. It refers to the
possible outcomes. The decision maker
has no ability to control the outcomes.
Triangles represent the terminal nodes. It
represents the final results of the
decision-making process.

VARIABLES
In choosing an appropriate tree, you
base your decision upon the target
variable. There are two types.
1. Categorical Variable – also known
as nominal variable and has the
following characteristics: a finite
domain set, and you can group this
into categories.

You use this variable if you want to


classify things.
2. Continuous Variable – also known
as numerical variable and has the
following characteristics: ordered
domain set, and you can graph this as
a line.

You use this variable if you want to


predict.
6.3 Types of Decision Trees
There are two types of trees, and these
are the following: classification and
regression trees (CART).
6.3.1 Classification Trees
Classification trees are used to group
data into classes. If there are more than
two categories, you use the C4.5
algorithm.
Classification can work in two ways:
descriptive and predictive modeling.
The first one utilizes a training set that is
composed of records. It is inductive in
form. Prediction uses a test set and is
deductive in form. Descriptive records
must be present to predict the missing
data in a database. Both records must
have similar classifiers.
6.3.2 Regression Trees
Regression Trees are used to forecast
linear numerical data. You use this one if
the target variable is continuous. If the
data is non-linear, you use the C.45
algorithm.
6.4 Decision Tree Methods
It is important to know first the core
competencies to be able to create a
decision tree manually. The main point
of creating decision trees is to divide the
data based on its similarity
(homogeneity).

ALGORITHM
J. Ross Quinlan developed the Iterative
Dichotomiser (ID3) algorithm in 1980.
This one became the foundation for
constructing decision trees.
Note: decision trees utilize a top-down
and greedy search because it starts with
a standard category (root), then divides
itself into classes that contain similar
values. ID3 uses entropy and information
gain.
Entropy refers to the maximum number
of yes or no questions the user can ask to
achieve a probability.
A complete similarity indicates zero on
entropy. If the root is divided equally,
then the entropy is one. Information gain
signifies the decrease of the entropy of
the target’s variable.

PROCESS
There are five basic steps for you to
construct a classification tree.
1. Separate the data into homogenous
and non-homogenous variables.
Homogenous variables have low
entropy.

P is the probability of an event


occurring.

2. Use joint entropy to compare the


influence of the independent
variables on the target variable.

3. Solve for the information gain. You


can do it by subtracting the joint
entropy from the target entropy.
4. Look for the independent variable
with the highest information gain. It
shall become the root of the tree.

5. Repeat the process for variables that


has an entropy not equal to zero.

6. Stop the process for the variables


that has a zero entropy. They shall
become the leaf of the decision tree.

TREE PRUNING
Overfitting happens when a model is a
complex one. It has numerous variables
involved. As a result, it lowers the
effectivity of the decision tree. On the
other hand, underfitting materializes
when the software cannot depict a trend
out of the given data.
The following approaches can be
implemented to prevent data overfitting:
Pre-pruning
It uses the error estimate to end the
tree’s construction.

Post-pruning
It uses the Chi2 Test to eliminate a
sub-tree.
Pruned trees are smaller and easy to
interpret.
Chapter 7: Social Network
Analysis
At the end of this section, you are
expected to:
Understand the definition of social
network analysis;
Differentiate social network analysis
from sentiment and decision-tree
analyses;
Discover the purpose of social
network analysis.
7.1 What is Social Network Analysis
(SNA)?
Unlike sentiment analysis, which
concerns for studying the individual
attributes, social network analysis
studies the relationships between people
and groups. It answers the questions,
“How do the relationships form?”, and
more importantly, “What are the
consequences of these relationships?”
This type of analysis originated from the
sociologists Georg Simmel and Emile
Durkheim.
Social patterns define the lives of the
individuals and the people surrounding
them. SNA based its ideas on the
previous sentence. Others claim that
these trends determine the success or
failure of institutions. For them, the
internal structure of the companies
affects the inclusive growth of the
business.
7.1.1 Types of Social Network Analysis
There are two types of SNA. The
egocentric analysis examines an
individual’s personal network and its
effects while sociocentric analysis
assesses large groups of people.
The egocentric analysis utilizes surveys
to its respondents. The interviewer asks
the respondents about their interaction
with other people. This type of SNA is
convenient to implement.
Analysts perceive Sociocentric analysis
harder than the previous one. As a
researcher, you get all the possible
relationship sets of sample respondents.
7.1.2 Related Terms
Propinquity refers to the likelihood of
people to obtain closer ties than the
others.
Homophily refers to the possibility of
individuals to associate with people
with the same trait as them. It is also
known as assortativity. Shared
characteristics, such as beliefs, value
systems, personality, make relationship
formation easier.
The Social Comparison and Social
Identity theories support the term
homophily. It states that one chooses the
similar others for comparison and some
individual defines the identity of the
group.
Moreover, nodes (point) refer to the
different persons, people within the
network. Lines, also called as the link
and tie, refer to the interactions
connecting them. The sociogram is the
term called for the diagram used to
represent the analysis.
Dyad refers to the pair of actors and
their interaction. Triad refers to a group
of three actors and their relationships.
Subgroup relates to the subset of the
actors and their interaction.

7.2 Applications of Social Network


Analysis
Social Network Analysis can aid the
company to solve its organizational
problems. Businesses can also utilize
SNA to accelerate the information
dissemination within their institution.
The government can arrest terrorists
through SNA. Health organizations can
track the disease epidemic as well.
Organizations use SNA to visualize the
relationships within or outside the
company. Additionally, they utilize SNA
further to identify key persons that
contribute to the firm.
Anthropologists use SNA to trace the
origins of certain species. Biologists use
SNA to understand the ecosystem
interaction. Historians employ SNA to
write accurate information about the
past.
Politics apply SNA to target the desired
people to support their campaign.
Additionally, they also determine the
greatest influencers on their campaign.
Now, you can use SNA as a tool suited
to your purposes.
7.3 Methodologies
Social Network Analysis aims to
analyze relational data. First, SNA needs
background data from its whole network.
One must select the people to assess.
The researcher then sets the objectives
of the analysis through the formulation of
research hypotheses. The researcher
must also validate the research
instrument before administering it to the
respondents.
Next, you conduct the interviews and
surveys. Then, you record all the social
interactions in a matrix. Finally, you can
use software to map the network
visually.
SNA depends on questionnaires and
interviews. It leads the company to an
understanding of the interpersonal
connections within the scope of the
analysis. You can use a combination of
structured and unstructured questions.
Structured questions refer to the open-
ended questions that will lead the
company to understand the interpersonal
connections.
To verify the answers on the
questionnaires, the researcher may opt-
in for a direct observation of the
respondents, and write a detailed view
on a diary.
A social network analysis tool shall be
used to visualize the network. You can
find the software recommendations in
Chapter 10.
Chapter 8: Business
Intelligence
At the end of this chapter, you should be
able to:
Uncover the meaning of business
intelligence;
Know the purpose of business
intelligence;
Differentiate business intelligence
from the rest of the techniques.

8.1 What Is Business Intelligence?


It refers to the set of technological tools
that retrieve valuable insights from large
datasets. Business Intelligence (BI)
includes but is not limited to queries,
reports, data mining, and online
analytical processing (OLAP).
8.1.1 Characteristics of Business
Intelligence
Business intelligence systems support
decision making, adapts to the business’
needs, anticipates events and optimizes
business queries.
BI analyzes historical data, captures
periodic snapshots, and gives out
detailed, summary for its users. The
following are the actual function of BI:
submission of reports, analyzation, and
prediction.
Reports answer the question, “What
Happened?”.
Analyzation answers the question, “Why
did it happen?”.
Prediction answers the question, “What
will happen?”.
Operational analysis is the essential
characteristic of real-time BI. It answers
the question, “What Is Happening?”.

8.1.2 Categories of Business Intelligence


Business Intelligence has four categories
based on the guide questions:
Is the business process required
(operational, functional, practical)?
Is the focus on root analysis?

If it is both operational and root-cause,


then it is real time BI. If it is functional
but not root-cause, then it is tactical BI.
If it is root-cause but not operational,
then it is investigative BI. Finally, if it is
neither practical nor root-cause, then it
is the traditional BI.
Real-Time Business Intelligence
This one uses dashboards as its main
way to display aggregated information.
The author has discussed earlier the
different processes related to data
management. Business intelligence must
employ a fast response time.
Organizations must apply data
compression to their databases.
This method reduces the time for
information retrieval. Moreover, it
lessens the size of data stored. It also
decreases the bandwidth transferred
from the servers. As a whole, data
compression will trim down the
operational cost of the company.
However, some smaller business may
opt-in to outsource their data servers to
cut down maintenance cost. Others have
subscribed to software as a service
(SaaS), Infrastructure as a Service
(IaaS) and Platform as a Service (PaaS)
solutions.
Business Intelligence and Growth
BI can propel the business to their
desired growth. The following are the
primary purposes of BI:
Creation of a centralized view
through dashboards that enables
company executives to understand
the information easily;
Invention of consumer-oriented
analytics applications that allows
institutions to comprehend their
customer’s needs better;
Utilization of the data that paves the
way for companies to release
differentiated and breakthrough
products to the market;
Optimization of business processes
that leads firms to business
efficiency.

8.2 Business Intelligence Methodology


A typical enterprise BI environment
combines data warehousing and
analytical environment to benchmark
their business performance and to
perform knowledge discovery.
The data storage environment does the
data management. On the other side, the
analytical environment does the data
analysis and visualization.
As discussed in the earlier chapters, BI
also requires both structured and
unstructured data. This time, BI stores it
in a large binary object (BLOB).
BI requires significant data for it to
function. It needs the customers,
employee, supplier chain, financial, and
business performance information.
BI combines the different advanced
analytics tools mentioned in the earlier
chapters.

8.2.1 What Is An In-memory DBMS?


An In-memory database management
system (DBMS) eliminates the need for
hard disk storage and stores its chief
information on random access memory
(RAM).
Real-time business intelligence may use
in-memory DBMS since the process
requires processing of real-time data.
There are some advantages for choosing
this in your business. The main
advantage is the reduction of latency in
storing and processing data. Queries for
big data would be faster. However,
since RAMs are volatile types of
storage, which means in-memory DBMS
are vulnerable to data loss.
Non-volatile storage must be used to
complement in-memory DBMS. The
contents of the RAM must be replicated
to prevent data loss. There are a variety
of methods available for this.
One of the options is periodic snapshots;
it captures the whole content of the RAM
to a non-volatile storage regularly. In
case of system failure, the back-up can
restore the last successful data
transaction
Another option is to utilize non-volatile
RAM. This one eliminates the risk of
losing the data.
8.2.2 What Is Hadoop distribution?
The Apache Software Foundation
developed Hadoop. It is an open source,
a Java-powered framework that is
designed to process and store big data in
distributed computing.
It has a distributed file system (HDFS)
that allows for a high bandwidth transfer
among its nodes. This reason allows the
system to function despite system
maintenance or failure.
One of its benefits is it improves the
data management abilities of the
company.

8.2.3 What Is NoSQL?


It refers to the set of nonrelational
distributed and open source databases. It
improves the limitations of SQL such as
the sluggish performance of it in serving
huge amount of data.
An example of a NoSQL technology is
Cassandra.

8.2.4 What Is MapReduce?


It is a software framework that collates
big data from various sources. It can
also process petabytes of data across
machines.
Chapter 9: Best Practices
At the end of this section, you are
expected to:
Apply the methods mentioned in
real-life;
Understand the advice and warnings
given;
Mitigate the risks of data analytics.

You may be reading this book because


you are interested in applying the said
analytics to your business. This chapter
will help you fulfill that goal today.
9.1 What Are the Best Practices for
Data Analytics?
Statistics have shown that data scientists
spend up to 80 percent of their time on
data preparation. The author would
discuss data management first because of
the trend mentioned.

9.1.1 Tips on Data Management


The author presented the following
suggestions in bullet form.
Reduce the unnecessary blocks from
accessing data

One of the hindrances to effective


analytics is the ineffective data
cleansing. Studies have pointed out
that up to 40 percent of analytics fail
because of subpar data. The solution
is to create an efficient method that
the whole company will use. Another
one would be outsourcing the data
management process. A variety of
solutions are readily available on the
internet, which the author shall
explain in the next chapter.

Distribute metadata across the data


management team

Once the company has decided on an


efficient method, it is a wise idea to
share the core process to the whole
company. In this way, the centrality of
data could lead to better data
analytics and more precise data. It
also improves the total productivity
of the enterprise.

9.1.2 Tips on Data Analytics


Observe teamwork across the data
analytics and management team.

The utilization of advanced


techniques in data analytics requires
different people with sufficient
knowledge and experience in that
particular field. Data scientists,
analysts, and engineers assigned to a
particular analytic team must
collaborate with others to achieve
unity in pursuing the results.

Experiment with the different


analytic tools

The data analytics tools available for


use has been steadily increasing
throughout these years. The different
tools have different strengths and
weaknesses in interpreting data. They
also give out many insights. If the
company utilizes the right
combination of instruments, they shall
discover more efficient methods in
processing and interpreting data.

9.1.3 Other Pieces of Advice


Apply security measures to the entire
data infrastructure

Recovering significant data loss is


difficult. Data leakage can be
prevented by setting up security
policies. Nodes of the internet
network must be monitored strictly.
The suspicious activity must be
caught right away if the IT personnel
has a watchful eye on fake nodes.

The company could also invest in


advanced encryption technologies
such as Advanced Encryption System
(AES), RSA and Secure Hash
Algorithm(SHA). Transport Layer
Security (TLS) and Secure Sockets
Layer (SSL) may also be utilized.

Digital rights management could also


be applied to data storage.

Take an inventory of the status of the


data systems regularly

Verification is essential in keeping


the confidentiality of the data. If left
unguarded, all security measures
could be put into waste. Checksums
may be used as well to ensure data
integrity.

The company may also invest in


white hat testing of the company’s
entire infrastructure.
Create rules on data management

Rules can protect the organizational


integrity. Policies on data handling
must be enforced. Some examples
are:

- Policies on Data Handling


This refers to the length of time an
employee could use sensitive
information.

-Polices on Data Access


This refers to the people who have
the right to access and modify
confidential data.
9.2 Advice on Big Data Migration
Transitioning to Big Data can be
difficult. Here are some tips to help your
business get started.
You need to know the purpose of
adaptation

Assess the need for the enterprise to


utilize big data analytics. Several
organizations have the habit of
adapting new technologies without
calculating the risk. Is the utilization
of big data really needed in your
company? If yes, then everyone must
be involved in the entire process.
Businesses must also plan on creating
a cost-effective big data
infrastructure. They must analyze
every option available and calculate
for the return on investment (ROI).
The ideal choice is the option that
will lead the company to the biggest
ROI.

Additionally, companies should also


decide if they will build their data
center infrastructure or invest in data
cloud services. One challenge of
building your own is the need for the
company to scale up with the demand
for big data. Additionally, new data
systems must be purchased
eventually.
Moreover, the company must also
identify its data sources and combine
it with other resources. A mixture of
various cloud solutions can save the
company’s money. Auto-scaling
options offered by them will increase
the company’s cloud capacity when
needed.

As a whole, do not migrate right


away to big data if your business is
not equipped.

You need to prepare for the


migration

One of the tips for a successful data


migration is casting a vision for the
entire organization. You could create
a road map that embodies the
company’s goals about big data. The
employees can identify the timeframe
quickly by just looking at the road
map.

All the stakeholders of the institution


must know the entire process of
migration. They must have attended
seminars or briefings on significant
data.

If the company does not have


experience a tall, they may hire
consultants to train people in the
company.
The employees must also know
where will the data go in the
migration process. IT Professionals
must prepare a backup solution for its
data.

There are migration programs offered


by service providers. It is up to your
company if you will participate.

You need to assess the migration

After the migration process, the data


executives must ask for feedback to
its constituents about the process. Is
there anything that needs to be
improved?
Take note that traditional databases
need not be discarded away. It can be
used side by side with newer
solutions set by Big Data.

The company must now create a


maintenance team to assess the
performance of the data at a set
schedule. The enterprise could also
evaluate the data that the firm does
not have yet. They may be able to
integrate it with the current database
soon.
Most of the preparation tips could
span from 2 weeks up to 15 weeks.
Some preparations could be
simultaneously done.
Chapter 10: Software
Recommendations
At the end of this chapter, you should be
able to:
Know the different software suited
for each technique;
Understand the basic use of each
software mentioned;
Put into practice the different
technologies.

A variety of software solutions is


available on the market. Some software
is open-sourced, which means they can
be easily edited and deployed to your
organization. Other software is
proprietary, which means they have
developed excellent solutions suited for
a need of the company.

10.1 Big Data Software


Recommendations
1010data has parallel processing
capabilities. It also offers data
integration and visualization along with
machine learning. It also provides
analytical DMBS. Their company is
based in New York. They launched their
business last 2000.
Amazon Web Services also offers
analytical DBS, MapReduce and its
proprietary Amazon Kinesis technology.
Their services focus on empowering
mission-critical workplaces. Their
hybrid strategies result in an excellent
reliability.
Cloudera leads the data-processing
industry with its distribution of Hadoop
software. It offers Apache Spark that
supports in-memory analysis of the
topic.
HP’s HAVEn (Hadoop, Autonomy,
Vertica, Enterprise Security, and “n”)
provides analytical DBMS. It lets users
explore the extensive database before
defining it. Their latest release is also
integrated with Hadoop.
IBM offers the widest choice of
database solutions in the industry. It has
solutions for analytical and in-memory
DBS. Moreover, it also provides
Hadoop distribution along with its
stream processing technologies.
Business intelligence prospectives could
look at their full range of services.
MapR has analytical, in-memory,
Hadoop technologies suited for your
business. Its architecture assures its
users for high availability and system
recovery.
Microsoft’s cloud-first vision comes to
fruition as they also provide complete
data solutions for its users. From its SQL
servers and OLTP to its Azure HDInsight
services, the company has come
prepared for the rise of big data.
Oracle also comes into the market with
another comprehensive solution for
business enterprises. It has fully
embraced the big data vision through its
offering of various real-time business
analytics solution.
SAP takes a share in the big data vision
with its products and services. Like
others, it has solutions for the diverse
needs of its clients. Additionally, it has a
built-in predictive analytics library.

10.2 Regression Analysis Software


Recommendations
You can learn R and start with the basic
do-it-yourself programming. The R
programming language is designed for
statistical applications.
If you do not want a complicated
programming language, then you can use
Python to visualize your data. Python
offers a human-understandable language.
Microsoft Excel performs the primary
statistical functions if you are
comfortable in using that.
You can also use NCSS, SPSS, NLREG,
XLStat, and JMP. They are helpful in
everyday analysis and research
problems. SAS offers general linear
models. However, you may need to study
the method.

10.3 Sentiment Analysis Software


Recommendations
A beginner can use solutions such as the
following:
Trackur offers straightforward and fast
social media monitoring at an affordable
price. SAS Sentiment Analysis tracks
emails, forms, surveys, and experiences.
It highlights the positive feedbacks of the
users.
OpenText scans for the trending valuable
content. It checks the content and
suggestions for the consumers. Dell’s
StatSoft offers predictive modeling with
their offering of sentiment analysis. It
tracks sales conversation in their
software.
Clarabridge scans in detail for written
or spoken languages and sorts out the
positive, negative, and neutral ones.
Opinion Crawl offers an online
sentiment analysis tool.
Other similar options are Dirt Digital,
The Research Bazaar, SSIX, Mark
Cieliebak, Cloud Cherry, Daniel Soper,
NICTA, Meltwater, NetOwl, and
TheySay.

10.4 Data Tree Analysis Software


Recommendations
Orange, is an open source data
visualization for beginners and experts.
It offers an extensive data mapping
solution.
Data Applied supports tree maps,
decision trees, and other analytical
tasks. It is an online data solution. Next,
SAP Lumira’s personal edition is free of
charge for visualizations.
MicrOsiris offers a detailed statistical
analysis, for free. It aids in decision tree
analysis as well.
The University of Wisconsin-Madison’s
Department of Statistics offers three free
algorithms: Quick, Unbiased and
Efficient Statistical Tree (QUEST),
CRUISE and GUIDE.
The National Aeronautics and Space
Administration (NASA) provides an
open-source software, IND. It is
available for download on their website.
Minitab and Salford Systems, DTREG,
and SysTAT also offer decision tree
analysis for a price.
10.5 Social Network Analysis
Software Recommendations
NetMiner and RapidMiner can be
helpful in plotting out the relationships
between nodes. You can also evaluate
WordStat and QDA Miner if it fits your
needs.
Gephi is an open-source software for
Windows, Linux, and Mac OS X. It is
free as well. You can customize the
color of the nodes according to your
preference. The vivid presentation will
allow viewers to understand the data
efficiently.
Graphviz is also an open source
visualization software. One of its unique
features is interpreting the user’s
descriptions of graphs and make
diagrams in a simple manner.
Netlytic automatically synthesizes large
volumes of texts from various social
media sites such as Twitter, Facebook,
Youtube, and others. It includes a free
version by signing up for an account.
NodeXL is a free tool for Microsoft
Excel to create meaningful network
graphs. It can also perform task
automation.

10.6 Using Geographic Information


Systems (GIS) Software
Geographic information systems can be
used to reveal location-based trends.
GIS uses maps to visualize the local
trends.
QGIS is an open-source program that
does the graphing tasks you need. You
can integrate it with a database, as well.
If you want to pay, ArcGIS is the
software solution for your needs.
Conclusion
After reading this book, you should be
able to start applying the different
strategies in data processing. Data
analytics can help propel your
businesses upward.
Bear in mind, the benefits and the risks
of data analytics.
In a world of rapid technology, it is
indeed difficult to process big data. If
you have the desireto adopt the
processes mentioned above, then you
need to plan carefully and choose
meticulously the methods that your
company shall apply.
From training to deployment, every
preparation must be done in your
organization to ease the transition to a
data-driven world.
Indeed, the guiding principles and
methodologies can transform your
entreprise’s productivity. You shall gain
insights that will lead your institution to
unparalleled success.
As the author, I wish victory in the
endeavor of your businesses.
Thanks again for buying my book. If you
have a minute, please leave a positive
review. You can give your review by
clicking on this link:
Leave your review here. Thank you!

I take reviews seriously and always look


at them. This way, you are helping me
provide you better content that you will
LOVE in the future. A review does not
have to be long, just one or two
sentences and the stars you find
appropriate (hopefully 5 of course).
Also, if I think your review is useful, I
will mark it as “helpful.” It will help
you become more known on Amazon as
a decent reviewer and will ensure that
more authors will contact you with free
e-books in the future. In this way, is how
we can help each other.

You can download any of my other


books too.

Look at these other books too:

Ebook Marketing
Make Money Online Fast
Self-Publishing for Beginners
Here is an excerpt of another book I
wrote, Ebook Marketing
Chapter 3: Kindle Advertising
I have heard stories from people who
did well through Facebook advertising,
Goodreads advertising, or Reddit
advertising. All of these websites charge
you per click, and if you know what
you’re doing, then it can pay off.
However, I am still convinced, also
seeing after my own results, that if you
advertise on the main websites where
people go to buy things, you’ll end up
generating more revenue than if you
would advertise where people go to
chit-chat, share pictures, or post their
daily stuff. If you want to socialize or
show off your kids, go to Facebook; if
you want to read and talk about books,
go to Goodreads, but many people know
that Amazon has become one of the main
go-to websites to BUY stuff. That’s why
I only advertise through Amazon ads
now.
In this chapter, I will show you how
advertising through
www.kdp.amazon.com works. Once you
get it, you don’t have to worry about
major risks, it’s super simple, and it can
definitely increase your sales. You’ll
have to do it through trial and error, but
it’s worth it.

So here are the steps:


1 On the reports page in KDP, click on
“Ad Campaigns.”
2 Click on “Create Ad.”
3 Click “Product Display Ads.”
4 Type in the title of your book and
select it.
5 Select “By Interest” or “By Product.”
6 If you select “By Interest,” simply
choose which interest fits your book the
best. If you choose “By Product,” type in
keywords that relate to your product.
With this option, just select several
dozens of products that you think would
be similar to yours. Your ads will be
displayed when people are on those
product pages. Typically select products
that are priced higher than yours; this
way, you’ll be more likely to get sales. If
yours were to be more expensive, then
why would they click on yours, right?
7 Name your campaign, like “Ebook
Marketing 101 by interest.”
8 Amazon suggests a certain cost per
click, but what I found, is that this is
usually way higher than what you can get
away with. Amazon may suggest $0.33
per click, for instance, but your ad can
still do well at $0.06 or $0.10 per click.
Start low, and if your campaign isn’t
generating a lot of views, clicks, or
sales, bid higher. Selecting “By Interest”
is more competitive than “By Product.”
It makes sense, because there are only so
many interests you can choose from, but
there are thousands of products out there.
Amazon ads work by bidding. Whoever
bids the highest cost per click, gets
displayed. So when there is more
competition, you have to bid higher.
Makes sense, right? So typically, I type
in a higher cost per click when I put my
book in a certain interest.
9 The minimum for a campaign is
$100.00 maximum, but don’t worry.
Most times, you don’t spend that much,
or anything near it. I have never spent
that much actually, and if you see a
campaign that is making you lose money,
you can just terminate it anyway. So the
risk is very low.
10 I usually do campaigns for a month
and run it as fast as possible.
11 Create a catchy headline, a catchy
text to promote your book, submit the
campaign for review, and you’re done.
Some Tips about Amazon Ads:
A: Keep an eye on them, so you can
interfere by changing the cost per click
or terminating them.
B: When you analyze the stats, take into
account that you made that much money
in revenue, not profit. If your book is
enrolled in KDP Select, and you chose
the 70% royalty option, then that’s how
much you get. So, for instance, it may
say that you made $2.99 revenue in the
ad campaigns, but you really made 70%
of that, which is a little over two
dollars. Deduct the “Spend” column, and
you’ll know your profit.
C: Something that is annoying about
Amazon Ads, is that it doesn’t show the
results until 3 days after approval. Know
this, and don’t get frustrated or
impatient.
D: Another thing you need to realize, is
that clicks can result into sales on
Createspace or ACX. Someone may
click on your ad but not get the Kindle
version. He/she may get the paperback
or audio version instead. Unfortunately,
it doesn’t show how many of those sales
are related to your ads. The best way to
estimate this, is to keep an eye on the
physical and audio book sales of those
particular books and see if they
increased. Sometimes, for this reason,
even if it shows I am making losses on
Amazon Ads, I keep the campaign
running, because I know I get sales from
them that aren’t Kindle. Makes sense,
right?
E: At the end of a campaign, analyze
your profits, and determine if you want
to run that campaign again. If you do, it’s
super easy to just click on “copy” in the
right column. This way, you’ll
automatically select the same products,
cost per click, etc.

DISCLAIMER: This information is


provided “as is.” The author,
publishers, and marketers of this
information disclaim any loss or
liability, either directly or indirectly as
a consequence of applying the
information presented herein, or
regarding the use and application of
said information. No guarantee is
given, either expressed or implied,
regarding the merchantability,
accuracy, or acceptability of the
information. The pages within this e-
book have been copyrighted.

You might also like