Navya - Week 4 Assignment
Navya - Week 4 Assignment
Week 4 Assignment
Week 4 Assignment
Navya Gurijala
Business Intelligence
ITS-531-A01
05/29/2024
2
Week 4 Assignment
Chapter 7
Discussion Question
Question 1
Data mining, text mining, and sentiment analysis are what can be considered as the
fields of mutual interest as they are all concerned with the process of mining the large data
and extracting the meaningful information. Data mining is the rigorous analysis of structured
data for patterns, trends and knowledge using statistical, logical, and machine learning
methodologies (Amado et al. , 2018). Text mining is defined as a type of data mining that
portrays blended information from textual, unformatted data, including reports and messages,
among others, using natural language processing and computational linguistics. Altogether,
these facilitate to make correct decisions by analyzing a lot of data for the business entities.
mining focuses on the appreciation of sentiment in textual material (Amado et al. , 2018).
Sentiment analysis tries to identify text as positive, negative or even a neutral text by
examining all the textual features such as words or phrases used. This technique is important
to testify the nature of customer sentiments and burgeoning strategies that can help a business
entity to review its marketing criteria and strategies from the customer satisfaction glowing
point of view (Amado, Amlouf & Ladevèze, 2018). It is also very important to understand
how these methodologies can be integrated in order to capitalize on big data in today’s
environment.
Question 2
Text mining can be defined as the process of extracting specific pieces of information
from large volumes of unstructured textual data with the help of text mining tools such as
natural language processing, machine learning, and statistical analysis. It refers to conversion
3
Week 4 Assignment
of text data into data that can be analyzed for patterns, trends and relationship (Ye et al. ,
2016). Text mining is used in as a tool to manage and analyze voluminous textual data and
This paper has highlighted some of the most common uses of text mining, such as
sentiment analysis, which involves assessing the opinions of the masses in social media,
surveys, and reviews to establish their attitudes (Ye et al. , 2016). Biomedical is another
major application of text mining, where information is retrieved from scientific documents to
assist in research and discovery of drugs. In customer feedback analysis text mining is used,
in fraud detection and in market research; this gives an idea of how this subject is crucial in
Question 3
Organizing text based data refers to the process of turning text data into one that can
easily be sifted through in data analysis by a computer. This encompasses categorizing the
text into a set format, for instance, tables or matrices, assigning each segment of the text
certain properties or styles (Amado et al. , 2018). Structural data enhances the understanding,
analysis and determination of trends which allows the use of data mining techniques to come
up with results.
There are several approaches to formalizing this method of categorizing the text-
based data. Among them, one of the most used is known as tokenization, the process of
(Amado et al. , 2018). Another approach is the pos tagging where at least the words are
labeled as to which grammatical part of speech they belong. It defines entities like names,
dates, and locations, among others, in the process known as named entity recognition. Other
4
Week 4 Assignment
methods, like sentiment analysis and topic modeling, are capable of categorizing text based
on the sentiment or the themes in the text, which allows for further organization and analysis.
Question 4
NLP is an essential technique in text mining as it provides the abilities for a computer
to recognize, comprehend, and produce natural languages. There is one possible way to
organize text information and make it usable – that is text mining methods like tokenization,
POS tagging, and named entity recognition (Ye et al. , 2016). They enable straightforward
text categorization, sentiment analysis and information retrieval critical in text mining
applications.
However, it should also be noted that NLP has certain drawbacks when it comes to
text mining. The primary concern is the issues of vocabulary and semantics since natural
languages are complex and can be interpreted in various ways (Ye et al. , 2016). Also, when
using NLP models, the models may ask for substantial amounts of annotated data to train
them, which may be costly. Also, the use of NLP techniques may also result in
misunderstandings of contextual and subtle meanings where the field faces challenges that
Exercise
Question 3
The eBay Analytics covers part of eBay’s way of using analytical approaches with the
purpose of improving its business model and its users. For instance, eBay, the world’s largest
online trading website, uses big data analytics to understand customer behaviour, enhance
product search results, and enhance efficiency in selling and buying on its platform (Ye et
al. , 2016). With the help of big data technology tools such as data mining and machine
learning, different trends can be forecasted; users can receive personalized recommendations
5
Week 4 Assignment
they are interested in; fraudulent activity can be detected to provide the safe environment
eBay users need. It can be ascertained from the case that integration of analytics into their
However, utilizing such broad metrics presents difficulties for firms. The fourth and
final internal factor for eBay to contend with is the scope of the data management of the
company as it scales the different facets of the organization, as well as the constant
considerations of data privacy and protection for its users, especially concerning the personal
and financial information of the users which can be exploited by hackers to perform
fraudulent activities (Ye et al. , 2016). However, prediction accuracy of models highly relies
on the quality and variety of data used, which makes the process of development and testing
of the indexes a constant and iterative process. This dynamicity of the Online market place
also implies dynamism in the analytical model which eBay has to occasionally adjust to fit
current needs. Nevertheless, eBay case presents how the company is putting focus on data
analytics, which outlines how big data can revolutionize the way businesses work, and how
Internet Exercise
Question 7
Scrutinizing the sections of applications and software on kdnuggets. com unveils
several sublime tools for data mining and text mining. Noteworthy is KNIME, a package
built to provide basic and advanced users with a smooth way to create data workflows and
encompass a set of data management tools that can be utilized to address various tasks of data
mining and machine learning (Chauhan et al. , 2020). Another strong tool presented is
RapidMiner for its simplicity in usage and numerous features, which let users to conduct data
preparation, building algorithms, and evaluate results at one platform. Another programme
6
Week 4 Assignment
commonly used globally and is open source is Weka, which is a suitable machine learning
algorithms that is implemented for data mining purposes for research and real use.
It is however worthwhile to note that, these tools are rather fermium and come with
certain limitations. However, with the active development and rich set of opportunities that
KNIME offers, the program can experience performance degradation when working with
large data sets (Chauhan et al. , 2020). However, while being rather intuitive to use, at least in
some of its versions, it may consume intensive CPU/GPU time when performing elaborate
calculations. In my example, even the task of filtering is clearly solvable in Weka, but I think
Weka can be less efficient storing and processing big data compared to the modern tools.
However, these packages play a crucial role in supporting various forms of superior data
mining and text mining and therefore assists the users to gain meaningful knowledge from
Week 4 Assignment
References
Amado, A., Cortez, P., Rita, P., & Moro, S. (2018). Research trends on Big Data in
Marketing: A text mining and topic modeling based literature analysis. European
https://doi.org/10.1016/j.iedeen.2017.06.002
Chauhan, P., Patel, N., Puri, V., & Narkhede, N. (2020). Data mining techniques for the
https://doi.org/10.1016/j.procs.2020.03.288
Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). SparkText: Biomedical Text
https://doi.org/10.1371/journal.pone.0162721