3types of Data Mining
3types of Data Mining
Text Mining
Most previous studies of data mining have focused on structured data, such as
relational,
transactional, and data warehouse data. However, in reality, a substantial
portion of
the available information is stored in text databases (or document databases),
which
consist of large collections of documents from various sources, such as news
articles,
research papers, books, digital libraries, e-mail messages, andWeb pages. Text
databases
are rapidly growing due to the increasing amount of information available in
electronic
form, such as electronic publications, various kinds of electronic documents, email, and
theWorldWideWeb (which can also be viewed as a huge, interconnected,
dynamic text
database). Nowadays most of the information in government, industry, business,
and
other institutions are stored electronically, in the form of text databases.
Data stored in most text databases are semistructured data in that they are
neither
completely unstructured nor completely structured. For example, a document
may
contain a few structured fields, such as title, authors, publication date, category,
and
so on, but also contain some largely unstructured text components, such as
abstract
and contents. There have been a great deal of studies on the modeling and
implementation of semistructured data in recent database research. Moreover,
information
retrieval techniques, such as text indexing methods, have been developed to
handle
unstructured documents.
Traditional information retrieval techniques become inadequate for the
increasingly
vast amounts of text data. Typically, only a small fraction of the many available
documents will be relevant to a given individual user.Without knowing what could be
in the