MGT7105E - Text and Web Mining Success Stories - MBA Report - BIAS
MGT7105E - Text and Web Mining Success Stories - MBA Report - BIAS
Executive Summary
This report presents recent case studies of successful text mining and web mining
applications, focusing on the methods employed and the impact on operational processes and
decision making. The exponential growth of unstructured data and the advancement in
technologies that allow organizations to exploit such data for decision making has led to
competitive advantages.
The cases highlight diverse industries and domains where data mining and analysis
has been effectively implemented. These applications utilize various methods such as natural
language processing (NLP), sentiment analysis, clustering, and association rule mining. This
report demonstrates the value and significance of text and web mining in driving operational
excellence and data-driven decision making of organizations in today’s digital world.
Background Information
There is a huge amount of digital data generated daily and the figures are growing at
an increasing rate. It is estimated that 329 million terabytes of data are generated every day in
2023 (Duarte, 2023). With such huge amounts of data, now popularly known as Big Data,
techniques to extract useful information are rising in importance (SAĞLAR & KEFE, 2021).
It is estimated that 80% of global data is unstructured (Rydning, 2022). Two popular
techniques of extracting unstructured digital data are text mining and web mining. With the
application of deep learning algorithms, hidden relationships within unstructured data can be
discovered and explored (IBM, 2020).
Web mining aims to automatically extract data originating on the web to derive
information using data mining techniques (Velasquez, 2013). Web mining can be divided into
three main types generally which are web content data, web structure data, and web usage
data (Jokar et al., 2016).
Text mining deals with extraction of quantitative patterns and knowledge from
qualitative measures such as textual data (Senave et al., 2023). Text mining consists of a
variety of methods such as keyword-based, statistical, and linguistic-based or NLP.
1
Success Story #1
Thematic is a solutions provider of decision-focused analytics that developed its own
unique self-supervised NLP software. The founders started building an NLP solution to
gather unstructured feedback via API in 2015 and is officially launched in 2017 (Thematic,
n.d.).
Atom Bank is UK’s first app-only bank that was founded in 2014 (Atom Bank, n.d.).
The company’s aim was to learn from customer feedback to focus on which parts of the
customer experience they could improve, with the goal of growing their customer base. They
built a customer analytics process that collects feedback through seven engagement channels
which are products feedback, online and app-store reviews, customer complaints, call-center
agent notes, and three different surveys. These individual channels provided independent and
siloed views.
Thematic uses API integration to combine data from various sources and NLP AI to
cluster and classify themes (Thematic, n.d.). Now, Atom Bank combines responses from
trusted third-party review platforms such as App Store, Trustpilot, Reevoo, and their internal
sales and customer experience platform that provide unstructured feedback into a single view
of the customer.
This enabled them to turn unstructured feedback from various sources into insights to
influence product roadmap, improve operations, app experience, and complaints handling
(Sherwood, 2017) which then enabled them to achieve over 40% reduction in calls related to
banking products and the apps while growing their customer base 110% year-over-year
(Thematic, n.d.).
Success Story #2
MonkeyLearn is a machine learning platform that allows users to detect topic or
sentiment expressed in unstructured text format through their machine learning (ML) models
as a platform (Garreta, 2023). The models are organized into classification and extraction
models. They provide the option of pre-built or custom ML models.
ADAPT is the world-leading Science Foundation Ireland (SFI) Research Centre that
aims to pioneer AI-driven digital content technology for a balanced digital society, focusing
on human centric AI, data governance, and breakthrough applications across domains
(Science Foundation Ireland, 2023).
2
Success Story #3
Rossum is a cloud-native intelligent document processing solution that uses advanced
AI to “read” documents instead of relying on traditional techniques such as optical character
recognition (OCR) or templates to extract data (Rossum, n.d.). Their cognitive OCR with ML
technology can adapt to layout changes, formatting cues, and even “understand”
terminologies which are the primary weaknesses of traditional techniques.
Morton Salt is an American food company that specializes in the production and
distribution of salt products for both consumption and industrial purposes (Morton Salt, n.d.).
As a major industrial supplier, they have a steady stream of a variety of documents from
suppliers and customers. This means there is a variety of formats and details from each
partner. Traditional techniques would require a minimum of one template for each type of
document from each partner to be able to extract useful information into a database.
With the advanced AI, they were able to achieve 95% faster processing time at an
average of 10 seconds per document processed compared to the previous system and 71%
straight-through-processing (STP) rate (Rossum, 2022). STP rate refers to an automated
transaction that is passed through the system without manual intervention.
3
Success Story #4
Aylien specializes in extracting, analyzing, and understanding vast amounts of
human-generated content focusing on global news outlets in real time (Aylien, 2023). They
employ NLP technique integrated to AI.
ESG Plus GmbH is an Austrian company that aims to democratize the sustainability
assessments of the financial markets for private investors (ESG Plus, 2023). Primary
assessments of sustainability are through analyzing official company policies, however there
are times when these organizations do not comply. As such, news is a critical data source to
assess compliance. Finding relevant news is difficult due to the volume and scale of global
news.
Now, ESG Plus is connected to about 1.2 million news articles per day enriched with
NLP to transform the contents to searchable and filterable data. Additionally, with the text
classification and event clustering technique, noise is reduced by narrowing down to articles
of interests without eliminating too much (Ebenstein, 2021). This enabled them to identify
events of interest and perform thorough investigations to validate information used for
sustainability assessments which increases customer confidence.
Success Story #5
SAS started as the Statistical Analysis System in the late 1960s. They developed a
suite of statistical software and services encompassing full analytics lifecycle from data
access and management, advanced analytics, multivariate analysis, business intelligence, and
model deployment and monitoring (SAS, 2023)
Their priority was keeping the athletes safe and healthy as all Special Olympics
athletes have unique health conditions and requirements. Each athlete is tracked via a
smartwatch that streams health and location data to a central dashboard that is observable by
4
medical professionals. Advanced analytics is applied to predict when health issues may occur
so that health personnel, equipment, and vehicles can be strategically positioned for quick
response times. A total of 1,529 medical incidents occurred during the event, including
serious incidents like seizures and asthma attacks, however the success was that no fatality
was recorded.
With real-time data and ML algorithms, fan experience could be personalized with
event recommendations, ticket availability, and travel time to the next venue of interest. They
also performed text analytics from social media posts of fans and performed sentiment
analysis to improve communications during the event and make data-driven policies after the
event (SAS, n.d.).
Identify a clear problem to solve and define clear goals that are business
focused.
Ensure technical parameters of the solution is aligned to business goals.
Engage expert partners to identify current and potential data sources to use.
5
References
Atom Bank. (n.d.). What's the story? Retrieved June 2023, from Atom Bank:
https://www.atombank.co.uk/about-us/
Bringula, R., Ulfa, S., Miranda, J. P., & Atienza, F. A. (2022). Text mining analysis on
students' expectations and anxieties towards data analytics course. Cogent
Engineering, 9(2). https://doi.org/10.1080/23311916.2022.2127469
Daley, M. (2019, March 13). Special Olympics World Games Abu Dhabi 2019 Makes
History by Welcoming 200 Nations. Retrieved from Special Olympics Press Releases:
https://www.specialolympics.org/about/press-releases/special-olympics-world-games-
abu-dhabi-2019-makes-history-by-welcoming-200-nations#:~:text=Nine%20world-
class%20venues%20in%20Abu%20Dhabi%20and%20Dubai,Games%2C%20while
%203%2C000%20coaches%20will%20assi
Duarte, F. (2023, April 3). Amount of Data Created Daily. Retrieved from Exploding Topics:
https://explodingtopics.com/blog/data-generated-per-day
Ebenstein, T. (2021). How ESG Plus expanded their news coverage and filtered out noise
using AYLIEN News API. (AYLIEN, Interviewer)
ESG Plus. (2023). The ESG Plus GmbH. Retrieved from ESG Plus:
https://www.esgplus.com/ueber-uns/
Jokar, N., Honarvar, A. R., Aghamirzadeh, S., & Esfandiari, K. (2016). Web mining and Web
usage mining techniques. Bulletin de la Societe des Sciences de Liege, 85, 321-328.
Lingwal, S., & Gupta, B. (2018). A Text Mining Approach for Automatic Classification of
Web Pages. Second IEEE International Conference on Advances in Electronics,
Electrical and Computer Engineering. Bangalore. https://doi.org/10.3850/ 978-981-
07-6935-2_52
MonkeyLearn. (n.d.). About ADAPT Centre. Retrieved July 2023, from Case Study: How
ADAPT Designed COVID-19 Surveys using AI:
https://monkeylearn.com/customers/adapt
Morton Salt. (n.d.). THE QUINTESSENTIAL AMERICAN BRAND. Retrieved July 2023,
from Morton Salt: https://www.mortonsalt.com/about-us/
SAĞLAR, J., & KEFE, İ. (2021). A review on data mining methods used in internal audit and
external audit. EKEV AKADEMI DERGISI, 25(88).
Senave, E., Jans, M. J., & Srivastava, R. P. (2023). The application of text mining in
accounting. International Journal of Accounting Information Systems, 50.
https://doi.org/10.1016/j.accinf.2023.100624
7
Sherwood, M. (2017). How insight fuel growth, trust, and efficiencies at UK's first app-only
bank. (Thematic, Interviewer)
TM, H., & G, K. (2022). Web Mining and Business Intelligence: A Key Factor for Success.
Technoarete Transactions on Intelligent Data Mining and Knowledge Discovery,
2(4). https://doi.org/10.36647/TTIDMKD/02.04.A004
Velasquez, J. D. (2013). Web mining and privacy concerns: Some important legal issues to
be consider before applying any data and information extraction technique in web-
based environments. Expert Systems with Applications, 40, 5228-5239.
https://doi.org/10.1016/j.eswa.2013.03.008