0% found this document useful (0 votes)
166 views

MGT7105E - Text and Web Mining Success Stories - MBA Report - BIAS

Uploaded by

Jefferson Voo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views

MGT7105E - Text and Web Mining Success Stories - MBA Report - BIAS

Uploaded by

Jefferson Voo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Successful Text and Web Mining Applications

Putra Business School MGT7105E

Business Intelligence and Analytics Systems (BIAS)

Executive Summary
This report presents recent case studies of successful text mining and web mining
applications, focusing on the methods employed and the impact on operational processes and
decision making. The exponential growth of unstructured data and the advancement in
technologies that allow organizations to exploit such data for decision making has led to
competitive advantages.

The cases highlight diverse industries and domains where data mining and analysis
has been effectively implemented. These applications utilize various methods such as natural
language processing (NLP), sentiment analysis, clustering, and association rule mining. This
report demonstrates the value and significance of text and web mining in driving operational
excellence and data-driven decision making of organizations in today’s digital world.

Background Information
There is a huge amount of digital data generated daily and the figures are growing at
an increasing rate. It is estimated that 329 million terabytes of data are generated every day in
2023 (Duarte, 2023). With such huge amounts of data, now popularly known as Big Data,
techniques to extract useful information are rising in importance (SAĞLAR & KEFE, 2021).
It is estimated that 80% of global data is unstructured (Rydning, 2022). Two popular
techniques of extracting unstructured digital data are text mining and web mining. With the
application of deep learning algorithms, hidden relationships within unstructured data can be
discovered and explored (IBM, 2020).

Web mining aims to automatically extract data originating on the web to derive
information using data mining techniques (Velasquez, 2013). Web mining can be divided into
three main types generally which are web content data, web structure data, and web usage
data (Jokar et al., 2016).

Text mining deals with extraction of quantitative patterns and knowledge from
qualitative measures such as textual data (Senave et al., 2023). Text mining consists of a
variety of methods such as keyword-based, statistical, and linguistic-based or NLP.
1

Success Story #1
Thematic is a solutions provider of decision-focused analytics that developed its own
unique self-supervised NLP software. The founders started building an NLP solution to
gather unstructured feedback via API in 2015 and is officially launched in 2017 (Thematic,
n.d.).

Atom Bank is UK’s first app-only bank that was founded in 2014 (Atom Bank, n.d.).
The company’s aim was to learn from customer feedback to focus on which parts of the
customer experience they could improve, with the goal of growing their customer base. They
built a customer analytics process that collects feedback through seven engagement channels
which are products feedback, online and app-store reviews, customer complaints, call-center
agent notes, and three different surveys. These individual channels provided independent and
siloed views.

Thematic uses API integration to combine data from various sources and NLP AI to
cluster and classify themes (Thematic, n.d.). Now, Atom Bank combines responses from
trusted third-party review platforms such as App Store, Trustpilot, Reevoo, and their internal
sales and customer experience platform that provide unstructured feedback into a single view
of the customer.

This enabled them to turn unstructured feedback from various sources into insights to
influence product roadmap, improve operations, app experience, and complaints handling
(Sherwood, 2017) which then enabled them to achieve over 40% reduction in calls related to
banking products and the apps while growing their customer base 110% year-over-year
(Thematic, n.d.).

Success Story #2
MonkeyLearn is a machine learning platform that allows users to detect topic or
sentiment expressed in unstructured text format through their machine learning (ML) models
as a platform (Garreta, 2023). The models are organized into classification and extraction
models. They provide the option of pre-built or custom ML models.

ADAPT is the world-leading Science Foundation Ireland (SFI) Research Centre that
aims to pioneer AI-driven digital content technology for a balanced digital society, focusing
on human centric AI, data governance, and breakthrough applications across domains
(Science Foundation Ireland, 2023).
2

ADAPT planned to analyze the psychological effects of COVID-19 on Irish citizens


with the objective to create surveys that may assist decision-makers and health professionals
to formulate informed strategies to mitigate a national spike in serious mental health issues.
They did this by building a custom COVID-19 Emotions Classifier to recognize negative
psychological features. To build their model, over 50,000 social media posts were identified
with keyword extraction and then sentiment analysis was performed to identify negative
feelings related to the pandemic. A second model to detect features or topics related to each
negative sentiment were built using the same datasets with different tags (MonkeyLearn,
n.d.).

This enabled ADAPT to collaborate with team members of different backgrounds


with varying experience in AI, ML, and data science to examine the data in a unified
workflow. They built two large-scale risk factor analysis ML models in four weeks
(Jadvidnia, 2021). As such, they could design surveys that are able to guide decision-makers
and health professionals in a timely manner during the pandemic.

Success Story #3
Rossum is a cloud-native intelligent document processing solution that uses advanced
AI to “read” documents instead of relying on traditional techniques such as optical character
recognition (OCR) or templates to extract data (Rossum, n.d.). Their cognitive OCR with ML
technology can adapt to layout changes, formatting cues, and even “understand”
terminologies which are the primary weaknesses of traditional techniques.

Morton Salt is an American food company that specializes in the production and
distribution of salt products for both consumption and industrial purposes (Morton Salt, n.d.).
As a major industrial supplier, they have a steady stream of a variety of documents from
suppliers and customers. This means there is a variety of formats and details from each
partner. Traditional techniques would require a minimum of one template for each type of
document from each partner to be able to extract useful information into a database.

With the advanced AI, they were able to achieve 95% faster processing time at an
average of 10 seconds per document processed compared to the previous system and 71%
straight-through-processing (STP) rate (Rossum, 2022). STP rate refers to an automated
transaction that is passed through the system without manual intervention.
3

Success Story #4
Aylien specializes in extracting, analyzing, and understanding vast amounts of
human-generated content focusing on global news outlets in real time (Aylien, 2023). They
employ NLP technique integrated to AI.

ESG Plus GmbH is an Austrian company that aims to democratize the sustainability
assessments of the financial markets for private investors (ESG Plus, 2023). Primary
assessments of sustainability are through analyzing official company policies, however there
are times when these organizations do not comply. As such, news is a critical data source to
assess compliance. Finding relevant news is difficult due to the volume and scale of global
news.

Now, ESG Plus is connected to about 1.2 million news articles per day enriched with
NLP to transform the contents to searchable and filterable data. Additionally, with the text
classification and event clustering technique, noise is reduced by narrowing down to articles
of interests without eliminating too much (Ebenstein, 2021). This enabled them to identify
events of interest and perform thorough investigations to validate information used for
sustainability assessments which increases customer confidence.

Success Story #5
SAS started as the Statistical Analysis System in the late 1960s. They developed a
suite of statistical software and services encompassing full analytics lifecycle from data
access and management, advanced analytics, multivariate analysis, business intelligence, and
model deployment and monitoring (SAS, 2023)

Thousands of athletes and an estimated 500,000 spectators participated in the Special


Olympics World Games Abu Dhabi 2019 (Daley, 2019). This was the first time that the
Special Olympics was hosted in the Middle East North Africa. The vision was to use data for
good and to deliver excellence at the World Games. Three months prior to the event, they
formed a team of volunteer data scientists across five continents to combine 25 data sets that
covers venues, ticketing, registration, medical information, volunteers details, travel,
accommodation, catering, scheduling, and others (SAS, n.d.).

Their priority was keeping the athletes safe and healthy as all Special Olympics
athletes have unique health conditions and requirements. Each athlete is tracked via a
smartwatch that streams health and location data to a central dashboard that is observable by
4

medical professionals. Advanced analytics is applied to predict when health issues may occur
so that health personnel, equipment, and vehicles can be strategically positioned for quick
response times. A total of 1,529 medical incidents occurred during the event, including
serious incidents like seizures and asthma attacks, however the success was that no fatality
was recorded.

With real-time data and ML algorithms, fan experience could be personalized with
event recommendations, ticket availability, and travel time to the next venue of interest. They
also performed text analytics from social media posts of fans and performed sentiment
analysis to improve communications during the event and make data-driven policies after the
event (SAS, n.d.).

Conclusion and Recommendations


Text mining and web mining can be applied to a wide range of industries and
businesses. The mining technologies can preprocess unstructured data from a wide variety of
sources efficiently to obtain structured data (Lingwal & Gupta, 2018). With a combination of
statistics, analysis techniques, and AI/ML technologies, trends and correlations among large
datasets can be found to uncover meaningful and actionable insights (Bringula et al., 2022).
The use cases of text and web mining from the case studies above includes risk management,
knowledge management, customer care services, healthcare, contextual advertising, business
intelligence, and social media data analysis.

There is a variety of combinations of mining and extracting, and analysis techniques,


where the selection of the techniques is dependent on the unique use cases of the
organization. Effective use of web or text mining and business intelligence enables greater
potential to analyze data granularly and steers an organization towards making decisions, be
it strategic or operational, that are data-driven rather than based on intuition (TM & G, 2022).

To maximize the potential success of unstructured data analysis, organizations need to


do what the five case studies outlined have done:

 Identify a clear problem to solve and define clear goals that are business
focused.
 Ensure technical parameters of the solution is aligned to business goals.
 Engage expert partners to identify current and potential data sources to use.
5

 Ensure change management process is part of the workflow to ensure


adoption.
 Measure output against business metrics or goals.

References
Atom Bank. (n.d.). What's the story? Retrieved June 2023, from Atom Bank:
https://www.atombank.co.uk/about-us/

Aylien. (2023). What We Do. Retrieved from Aylien: A Quintexa Company:


https://aylien.com/company/about-us

Bringula, R., Ulfa, S., Miranda, J. P., & Atienza, F. A. (2022). Text mining analysis on
students' expectations and anxieties towards data analytics course. Cogent
Engineering, 9(2). https://doi.org/10.1080/23311916.2022.2127469

Daley, M. (2019, March 13). Special Olympics World Games Abu Dhabi 2019 Makes
History by Welcoming 200 Nations. Retrieved from Special Olympics Press Releases:
https://www.specialolympics.org/about/press-releases/special-olympics-world-games-
abu-dhabi-2019-makes-history-by-welcoming-200-nations#:~:text=Nine%20world-
class%20venues%20in%20Abu%20Dhabi%20and%20Dubai,Games%2C%20while
%203%2C000%20coaches%20will%20assi

Duarte, F. (2023, April 3). Amount of Data Created Daily. Retrieved from Exploding Topics:
https://explodingtopics.com/blog/data-generated-per-day

Ebenstein, T. (2021). How ESG Plus expanded their news coverage and filtered out noise
using AYLIEN News API. (AYLIEN, Interviewer)

ESG Plus. (2023). The ESG Plus GmbH. Retrieved from ESG Plus:
https://www.esgplus.com/ueber-uns/

Garreta, R. (2023). What is MonkeyLearn? Retrieved from MonkeyLearn:


https://help.monkeylearn.com/en/articles/2174206-what-is-monkeylearn

IBM. (2020). What is text mining? Retrieved from IBM Newsletter:


https://www.ibm.com/topics/text-mining

Jadvidnia, D. H. (2021). How ADAPT Designed COVID-19 Surveys Using AI.


(MonkeyLearn, Interviewer)
6

Jokar, N., Honarvar, A. R., Aghamirzadeh, S., & Esfandiari, K. (2016). Web mining and Web
usage mining techniques. Bulletin de la Societe des Sciences de Liege, 85, 321-328.

Lingwal, S., & Gupta, B. (2018). A Text Mining Approach for Automatic Classification of
Web Pages. Second IEEE International Conference on Advances in Electronics,
Electrical and Computer Engineering. Bangalore. https://doi.org/10.3850/ 978-981-
07-6935-2_52

MonkeyLearn. (n.d.). About ADAPT Centre. Retrieved July 2023, from Case Study: How
ADAPT Designed COVID-19 Surveys using AI:
https://monkeylearn.com/customers/adapt

Morton Salt. (n.d.). THE QUINTESSENTIAL AMERICAN BRAND. Retrieved July 2023,
from Morton Salt: https://www.mortonsalt.com/about-us/

Rossum. (2022). Customer Stories. Retrieved from Rossum: https://rossum.ai/customer-


stories/morton-salt/

Rossum. (n.d.). About Rossum. Retrieved July 2023, from Rossum:


https://rossum.ai/company/

Rydning, J. (2022). Worldwide Global DataSphere and Global StorageSphere Structured


and Unstructured Data Forecast, 2022-2026. International Data Corporation (IDC).

SAĞLAR, J., & KEFE, İ. (2021). A review on data mining methods used in internal audit and
external audit. EKEV AKADEMI DERGISI, 25(88).

SAS. (2023). About SAS. Retrieved from SAS: https://www.sas.com/en-us/company-


information/why-sas.html

SAS. (n.d.). Customer Success Stories. Retrieved from SAS:


https://www.sas.com/en_us/customers/special-olympics-world-games-abu-dhabi.html

Science Foundation Ireland. (2023). About ADAPT. Retrieved from ADAPT:


https://www.adaptcentre.ie/about

Senave, E., Jans, M. J., & Srivastava, R. P. (2023). The application of text mining in
accounting. International Journal of Accounting Information Systems, 50.
https://doi.org/10.1016/j.accinf.2023.100624
7

Sherwood, M. (2017). How insight fuel growth, trust, and efficiencies at UK's first app-only
bank. (Thematic, Interviewer)

Thematic. (n.d.). About us. Retrieved June 2023, from thematic:


https://getthematic.com/about/

Thematic. (n.d.). Case Studies. Retrieved June 2023, from thematic:


https://getthematic.com/case-studies/

TM, H., & G, K. (2022). Web Mining and Business Intelligence: A Key Factor for Success.
Technoarete Transactions on Intelligent Data Mining and Knowledge Discovery,
2(4). https://doi.org/10.36647/TTIDMKD/02.04.A004

Velasquez, J. D. (2013). Web mining and privacy concerns: Some important legal issues to
be consider before applying any data and information extraction technique in web-
based environments. Expert Systems with Applications, 40, 5228-5239.
https://doi.org/10.1016/j.eswa.2013.03.008

You might also like