5 knowledge representation
5 knowledge representation
REPRESENTATION
Knowledge Representation
• Knowledge representation is the presentation of knowledge to the user for
visualization in terms of trees, tables, rules graphs, charts, matrices, etc.
• For Example: Histograms
Data Mining Task Primitives
A data mining task can be specified in the form of a data mining query, which is
input to the data mining system.
A data mining query is defined in terms of data mining task primitives. These
primitives allow the user to interactively communicate with the data mining system
during discovery in order to direct the mining process, or examine the findings from
different angles or depths
The data mining primitives specify the
following
• The set of task-relevant data to be mined: This specifies the portions of the
database or the set of data in which the user is interested. This includes the
database attributes or data warehouse dimensions of interest (referred to as the
relevant attributes or dimensions).
• The kind of knowledge to be mined: This specifies the data mining functions to be
performed, such as characterization, discrimination, association or correlation
analysis, classification, prediction, clustering, outlier analysis, or evolution
analysis.
• The background knowledge to be used in the discovery process: This knowledge
about the domain to be mined is useful for guiding the knowledge discovery
process and for evaluating the patterns found. Concept hierarchies are a popular
form of background knowledge, which allow data to be mined at multiple levels of
abstraction.
• The interestingness measures and thresholds for pattern evaluation: They may be
used to guide the mining process or, after discovery, to evaluate the discovered
patterns. Different kinds of knowledge may have different interestingness
measures.
• The expected representation for visualizing the discovered patterns: This refers to
the form in which discovered patterns are to be displayed, which may include
rules, tables, charts, graphs, decision trees, and cubes.
Data scientists decompose a business problem into subtasks. The solutions to the
subtasks can then be composed to solve the overall problem. Some of these
subtasks are unique to the particular business problem, but others are common data
mining tasks.
Data Mining Tasks
• Classification (class probability estimation)
• Clustering
• Regression
• Co-occurrence grouping (association rules)
• Data reduction
Task Relevant Data
Task relevant data: where and how to retrieve the data to be used for mining.
i. Scatter-plot matrices
• It consists of scatter plots of all possible pairs of variables in a dataset.
i. Dimensional stacking
In dimension stacking, n-dimensional attribute space is partitioned in 2-dimensional
subspaces.
Attribute values are partitioned into various classes.
Each element is two dimensional space in the form of xy plot.
ii. Mosaic plot
Mosaic plot gives the graphical representation of
successive decompositions.
Rectangles are used to represent the count of categorical
data and at every stage, rectangles are split parallel.
iii. Worlds within worlds
Worlds within worlds are useful to generate an interactive hierarchy of display.
Innermost word must have a function and two most important parameters.
Through this, N-vision of data are possible like data glove and stereo displays, including
rotation, scaling (inner) and translation (inner/outer).
iv. Tree maps
Tree maps visualization techniques are well suited for displaying large amount of hierarchical
structured data.
The visualization space is divided into the multiple rectangles that are ordered, according to a
quantitative variable.
The levels in the hierarchy are seen as rectangles containing the other rectangle.
Each set of rectangles on the same level in the hierarchy represents a category, a column or an
expression in a data set.