0% found this document useful (0 votes)
7 views

Navya - Week 4 Assignment

Uploaded by

omairaomuse
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Navya - Week 4 Assignment

Uploaded by

omairaomuse
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

1

Week 4 Assignment

Week 4 Assignment

Navya Gurijala

University of the Cumberland's

Business Intelligence

ITS-531-A01

Dr. Abiodun Adeleke (Abbey)

05/29/2024
2

Week 4 Assignment

Chapter 7

Discussion Question

Question 1
Data mining, text mining, and sentiment analysis are what can be considered as the

fields of mutual interest as they are all concerned with the process of mining the large data

and extracting the meaningful information. Data mining is the rigorous analysis of structured

data for patterns, trends and knowledge using statistical, logical, and machine learning

methodologies (Amado et al. , 2018). Text mining is defined as a type of data mining that

portrays blended information from textual, unformatted data, including reports and messages,

among others, using natural language processing and computational linguistics. Altogether,

these facilitate to make correct decisions by analyzing a lot of data for the business entities.

Specifically, the text mining application referred to as sentiment analysis or opinion

mining focuses on the appreciation of sentiment in textual material (Amado et al. , 2018).

Sentiment analysis tries to identify text as positive, negative or even a neutral text by

examining all the textual features such as words or phrases used. This technique is important

to testify the nature of customer sentiments and burgeoning strategies that can help a business

entity to review its marketing criteria and strategies from the customer satisfaction glowing

point of view (Amado, Amlouf & Ladevèze, 2018). It is also very important to understand

how these methodologies can be integrated in order to capitalize on big data in today’s

environment.

Question 2
Text mining can be defined as the process of extracting specific pieces of information

from large volumes of unstructured textual data with the help of text mining tools such as

natural language processing, machine learning, and statistical analysis. It refers to conversion
3

Week 4 Assignment

of text data into data that can be analyzed for patterns, trends and relationship (Ye et al. ,

2016). Text mining is used in as a tool to manage and analyze voluminous textual data and

generate useful information to support decision making across different domains.

This paper has highlighted some of the most common uses of text mining, such as

sentiment analysis, which involves assessing the opinions of the masses in social media,

surveys, and reviews to establish their attitudes (Ye et al. , 2016). Biomedical is another

major application of text mining, where information is retrieved from scientific documents to

assist in research and discovery of drugs. In customer feedback analysis text mining is used,

in fraud detection and in market research; this gives an idea of how this subject is crucial in

processing big text datasets.

Question 3
Organizing text based data refers to the process of turning text data into one that can

easily be sifted through in data analysis by a computer. This encompasses categorizing the

text into a set format, for instance, tables or matrices, assigning each segment of the text

certain properties or styles (Amado et al. , 2018). Structural data enhances the understanding,

analysis and determination of trends which allows the use of data mining techniques to come

up with results.

There are several approaches to formalizing this method of categorizing the text-

based data. Among them, one of the most used is known as tokenization, the process of

disaggregation of messages into components (tokens), which can be a word or a phrase

(Amado et al. , 2018). Another approach is the pos tagging where at least the words are

labeled as to which grammatical part of speech they belong. It defines entities like names,

dates, and locations, among others, in the process known as named entity recognition. Other
4

Week 4 Assignment

methods, like sentiment analysis and topic modeling, are capable of categorizing text based

on the sentiment or the themes in the text, which allows for further organization and analysis.

Question 4
NLP is an essential technique in text mining as it provides the abilities for a computer

to recognize, comprehend, and produce natural languages. There is one possible way to

organize text information and make it usable – that is text mining methods like tokenization,

POS tagging, and named entity recognition (Ye et al. , 2016). They enable straightforward

text categorization, sentiment analysis and information retrieval critical in text mining

applications.

However, it should also be noted that NLP has certain drawbacks when it comes to

text mining. The primary concern is the issues of vocabulary and semantics since natural

languages are complex and can be interpreted in various ways (Ye et al. , 2016). Also, when

using NLP models, the models may ask for substantial amounts of annotated data to train

them, which may be costly. Also, the use of NLP techniques may also result in

misunderstandings of contextual and subtle meanings where the field faces challenges that

are best illustrated in the following points.

Exercise

Question 3
The eBay Analytics covers part of eBay’s way of using analytical approaches with the

purpose of improving its business model and its users. For instance, eBay, the world’s largest

online trading website, uses big data analytics to understand customer behaviour, enhance

product search results, and enhance efficiency in selling and buying on its platform (Ye et

al. , 2016). With the help of big data technology tools such as data mining and machine

learning, different trends can be forecasted; users can receive personalized recommendations
5

Week 4 Assignment

they are interested in; fraudulent activity can be detected to provide the safe environment

eBay users need. It can be ascertained from the case that integration of analytics into their

decisions is an example of how big data is important to improve business results.

However, utilizing such broad metrics presents difficulties for firms. The fourth and

final internal factor for eBay to contend with is the scope of the data management of the

company as it scales the different facets of the organization, as well as the constant

considerations of data privacy and protection for its users, especially concerning the personal

and financial information of the users which can be exploited by hackers to perform

fraudulent activities (Ye et al. , 2016). However, prediction accuracy of models highly relies

on the quality and variety of data used, which makes the process of development and testing

of the indexes a constant and iterative process. This dynamicity of the Online market place

also implies dynamism in the analytical model which eBay has to occasionally adjust to fit

current needs. Nevertheless, eBay case presents how the company is putting focus on data

analytics, which outlines how big data can revolutionize the way businesses work, and how

consumers engage with their products and services.

Internet Exercise

Question 7
Scrutinizing the sections of applications and software on kdnuggets. com unveils

several sublime tools for data mining and text mining. Noteworthy is KNIME, a package

built to provide basic and advanced users with a smooth way to create data workflows and

encompass a set of data management tools that can be utilized to address various tasks of data

mining and machine learning (Chauhan et al. , 2020). Another strong tool presented is

RapidMiner for its simplicity in usage and numerous features, which let users to conduct data

preparation, building algorithms, and evaluate results at one platform. Another programme
6

Week 4 Assignment

commonly used globally and is open source is Weka, which is a suitable machine learning

algorithms that is implemented for data mining purposes for research and real use.

It is however worthwhile to note that, these tools are rather fermium and come with

certain limitations. However, with the active development and rich set of opportunities that

KNIME offers, the program can experience performance degradation when working with

large data sets (Chauhan et al. , 2020). However, while being rather intuitive to use, at least in

some of its versions, it may consume intensive CPU/GPU time when performing elaborate

calculations. In my example, even the task of filtering is clearly solvable in Weka, but I think

Weka can be less efficient storing and processing big data compared to the modern tools.

However, these packages play a crucial role in supporting various forms of superior data

mining and text mining and therefore assists the users to gain meaningful knowledge from

different datasets and improve the decision making.


7

Week 4 Assignment

References
Amado, A., Cortez, P., Rita, P., & Moro, S. (2018). Research trends on Big Data in

Marketing: A text mining and topic modeling based literature analysis. European

Research on Management and Business Economics, 24(1), 1–7.

https://doi.org/10.1016/j.iedeen.2017.06.002

Chauhan, P., Patel, N., Puri, V., & Narkhede, N. (2020). Data mining techniques for the

industrial internet of things: A review. Procedia Computer Science, 167, 448-457.

https://doi.org/10.1016/j.procs.2020.03.288

Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). SparkText: Biomedical Text

Mining on Big Data Framework. PloS One, 11(9), e0162721–e0162721.

https://doi.org/10.1371/journal.pone.0162721

You might also like