0% found this document useful (0 votes)
37 views

5 knowledge representation

Uploaded by

saharsh0812
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

5 knowledge representation

Uploaded by

saharsh0812
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

KNOWLEDGE

REPRESENTATION
Knowledge Representation
• Knowledge representation is the presentation of knowledge to the user for
visualization in terms of trees, tables, rules graphs, charts, matrices, etc.
• For Example: Histograms
Data Mining Task Primitives
A data mining task can be specified in the form of a data mining query, which is
input to the data mining system.

A data mining query is defined in terms of data mining task primitives. These
primitives allow the user to interactively communicate with the data mining system
during discovery in order to direct the mining process, or examine the findings from
different angles or depths
The data mining primitives specify the
following
• The set of task-relevant data to be mined: This specifies the portions of the
database or the set of data in which the user is interested. This includes the
database attributes or data warehouse dimensions of interest (referred to as the
relevant attributes or dimensions).
• The kind of knowledge to be mined: This specifies the data mining functions to be
performed, such as characterization, discrimination, association or correlation
analysis, classification, prediction, clustering, outlier analysis, or evolution
analysis.
• The background knowledge to be used in the discovery process: This knowledge
about the domain to be mined is useful for guiding the knowledge discovery
process and for evaluating the patterns found. Concept hierarchies are a popular
form of background knowledge, which allow data to be mined at multiple levels of
abstraction.
• The interestingness measures and thresholds for pattern evaluation: They may be
used to guide the mining process or, after discovery, to evaluate the discovered
patterns. Different kinds of knowledge may have different interestingness
measures.

• The expected representation for visualizing the discovered patterns: This refers to
the form in which discovered patterns are to be displayed, which may include
rules, tables, charts, graphs, decision trees, and cubes.

• A data mining query language can be designed to incorporate these primitives,


allowing users to flexibly interact with data mining systems.
Data Mining Tasks
Since the data mining process breaks up the overall task of finding patterns from
data into a set of well-defined subtasks, it is also useful for structuring discussions
about data science.

Data scientists decompose a business problem into subtasks. The solutions to the
subtasks can then be composed to solve the overall problem. Some of these
subtasks are unique to the particular business problem, but others are common data
mining tasks.
Data Mining Tasks
• Classification (class probability estimation)
• Clustering
• Regression
• Co-occurrence grouping (association rules)
• Data reduction
Task Relevant Data
Task relevant data: where and how to retrieve the data to be used for mining.

• Database or data warehouse name: where to find the data


• Database tables or data warehouse cubes
• Condition for data selection, relevant attributes or dimensions and data grouping
criteria: all this is used in the SQL query to retrieve the data
Background knowledge: Concept
hierarchies
• Schema hierarchy
• Set grouping hierarchy
• Operation driven hierarchy
• Rule-based hierarchy
Data Visualization
• It deals with the representation of data in a graphical or pictorial format.
• Patterns in the data are marked easily by using the data visualization technique.

Why visualize data?


• Identifying problems:
– Histograms for nominal attributes: is the distribution consistent with
background knowledge?
– Graphs for numeric values: detecting outliers.

• Visualization show dependencies


Visualization Techniques
1. Pixel- oriented visualization technique
2. Geometric projection visualization technique
3. Icon-based visualization techniques
4. Hierarchical visualization techniques
Pixel- oriented visualization technique
• In pixel based visualization techniques, there are separate sub-windows for the
value of each attribute and it is represented by one colored pixel.
• Tuple with 'm' variable has different 'm' colored pixel to represent each variable
and each variable has a sub window.
• The color mapping of the pixel is decided on the basis of data characteristics and
visualization tasks.
Geometric projection visualization
technique
Techniques used to find geometric transformation are:

i. Scatter-plot matrices
• It consists of scatter plots of all possible pairs of variables in a dataset.

ii. Hyper slice


• It is an extension to scatter-plot matrices. They represent multi-
dimensional function as a matrix of orthogonal two dimensional slices.

iii. Parallel co-ordinates


• The parallel vertical lines which are separated defines the axes.
Icon-based visualization techniques
• Icon-based visualization techniques are also known as iconic display techniques.
• Each multidimensional data item is mapped to an icon.
• This technique allows visualization of large amount of data.
• The most commonly used technique is Chernoff faces.
Hierarchical visualization techniques
• Hierarchical visualization techniques are used for partitioning of all dimensions in
to subset.
• These subsets are visualized in hierarchical manner.

i. Dimensional stacking
In dimension stacking, n-dimensional attribute space is partitioned in 2-dimensional
subspaces.
Attribute values are partitioned into various classes.
Each element is two dimensional space in the form of xy plot.
ii. Mosaic plot
Mosaic plot gives the graphical representation of
successive decompositions.
Rectangles are used to represent the count of categorical
data and at every stage, rectangles are split parallel.
iii. Worlds within worlds
Worlds within worlds are useful to generate an interactive hierarchy of display.
Innermost word must have a function and two most important parameters.
Through this, N-vision of data are possible like data glove and stereo displays, including
rotation, scaling (inner) and translation (inner/outer).
iv. Tree maps
Tree maps visualization techniques are well suited for displaying large amount of hierarchical
structured data.
The visualization space is divided into the multiple rectangles that are ordered, according to a
quantitative variable.
The levels in the hierarchy are seen as rectangles containing the other rectangle.
Each set of rectangles on the same level in the hierarchy represents a category, a column or an
expression in a data set.

You might also like