Data Mining-World Wide Web
Data Mining-World Wide Web
Over the last few years, the World Wide Web has become a significant source of information
and simultaneously a popular platform for business. Web mining can define as the method of
utilizing data mining techniques and algorithms to extract useful information directly from the
web, such as Web documents and services, hyperlinks, Web content, and server logs. The World
Wide Web contains a large amount of data that provides a rich source to data mining. The
objective of Web mining is to look for patterns in Web data by collecting and examining data in
order to gain insights.
Web content mining can be used to extract useful data, information, knowledge from the web
page content. In web content mining, each web page is considered as an individual document.
The individual can take advantage of the semi-structured nature of web pages, as HTML
provides information that concerns not only the layout but also logical structure. The primary
task of content mining is data extraction, where structured data is extracted from unstructured
websites. The objective is to facilitate data aggregation over various web sites by using the
extracted structured data. Web content mining can be utilized to distinguish topics on the web.
For Example, if any user searches for a specific task on the search engine, then the user will get a
list of suggestions.OOPs Concepts in Java
The web structure mining can be used to find the link structure of hyperlink. It is used to identify
that data either link the web pages or direct link network. In Web Structure Mining, an individual
considers the web as a directed graph, with the web pages being the vertices that are associated
with hyperlinks. The most important application in this regard is the Google search engine,
which estimates the ranking of its outcomes primarily with the PageRank algorithm. It
characterizes a page to be exceptionally relevant when frequently connected by other highly
related pages. Structure and content mining methodologies are usually combined. For example,
web structured mining can be beneficial to organizations to regulate the network between two
commercial sites.
Web usage mining is used to extract useful data, information, knowledge from the weblog
records, and assists in recognizing the user access patterns for web pages. In Mining, the usage
of web resources, the individual is thinking about records of requests of visitors of a website,
that are often collected as web server logs. While the content and structure of the collection of
web pages follow the intentions of the authors of the pages, the individual requests
demonstrate how the consumers see these pages. Web usage mining may disclose relationships
that were not proposed by the creator of the pages.
Some of the methods to identify and analyze the web usage patterns are given below:
The analysis of preprocessed data can be accomplished in session analysis, which incorporates
the guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's
behavior.
The document is created after this analysis, which contains the details of repeatedly visited web
pages, common entry, and exit.
OLAP can be accomplished on various parts of log related data in a specific period.
The site pages don't have a unifying structure. They are extremely complicated as compared to
traditional text documents. There are enormous amounts of documents in the digital library of
the web. These libraries are not organized according to a specific order.
The data on the internet is quickly updated. For example, news, climate, shopping, financial
news, sports, and so on.
o Relevancy of data:
It is considered that a specific person is generally concerned about a small portion of the web,
while the rest of the segment of the web contains the data that is not familiar to the user and
may lead to unwanted results.
The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for
data warehousing and data mining.