01. UNIT-I(DMWH6EM)
01. UNIT-I(DMWH6EM)
ESSAY QUESTIONS
1
UNIT-I (DMWH6EM) 2
large database for example, finding linked products in
gigabytes of store scanner data and mining a mountain for a
vein of valuable ore. Both processes require either sifting
through an immense amount of material, or intelligently
probing it to find exactly where the value resides. Given
databases of sufficient size and quality, data mining
technology can generate new business opportunities by
providing these capabilities:
1. Automated prediction of trends and behaviors: Data
mining automates the process of finding predictive
information in large databases. Questions that traditionally
required extensive hands-on analysis can now be answered
directly from the data quickly. A typical example of a
predictive problem is targeted marketing. Data mining uses
data on past promotional mailings to identify the targets most
likely to maximize return on investment in future mailings.
Other predictive problems include forecasting bankruptcy and
other forms of default, and identifying segments of a
population likely to respond similarly to given events.
2. Automated discovery of previously unknown
patterns: Data mining tools sweep through databases and
identify previously hidden patterns in one step. An example of
pattern discovery is the analysis of retail sales data to identify
seemingly unrelated products that are often purchased
together. Other pattern discovery problems include detecting
fraudulent credit card transactions and identifying anomalous
data that could represent data entry keying errors.
Tasks of Data Mining
Data mining involves six common classes of tasks:
UNIT-I (DMWH6EM) 3
1. Anomaly detection (Outlier/change/deviation
detection): The identification of unusual data records, that
might be interesting or data errors that require further
investigation.
2. Association rule learning (Dependency modelling):
Searches for relationships between variables. For example a
supermarket might gather data on customer purchasing habits.
Using association rule learning, the supermarket can
determine which products are frequently bought together and
use this information for marketing purposes. This is
sometimes referred to as market basket analysis.
3. Clustering: It is the task of discovering groups and
structures in the data that are in some way or another
"similar", without using known structures in the data.
4. Classification: It is the task of generalizing known
structure to apply to new data. For example, an e-mail
program might attempt to classify an e-mail as "legitimate" or
as "spam". Regression – attempts to find a function which
models the data with the least error.
5. Summarization: Providing a more compact
representation of the data set, including visualization and
report generation.
Example:
Attribute Values
Colours Black, Brown, Whit
Categorical Data Lecturere, Professor, Assistant Professor
2. Binary Attributes: Binary data has only 2
values/states. For Example yes or no, affected or unaffected,
true or false.
i) Symmetric: Both values are equally important
(Gender).
ii) Asymmetric: Both values are not equally important
(Result).
Quantitative Attributes
1. Numeric: A numeric attribute is quantitative because,
it is a measurable quantity, represented in integer or real
values. Numerical attributes are of 2 types, interval and ratio.
i) An interval-scaled attribute: It has values, whose
differences are interpretable, but the numerical attributes do
not have the correct reference point or we can call zero point.
Data can be added and subtracted at interval scale but cannot
be multiplied or divided. Consider an example of temperature
in degrees Centigrade. If a day’s temperature of one day is
twice than the other day we cannot say that one day is twice
as hot as another day.
ii) A ratio-scaled attribute: It is a numeric attribute with an
fix zero-point. If a measurement is ratio-scaled, we can say of
a value as being a multiple (or ratio) of another value. The
values are ordered, and we can also compute the difference
between values, and the mean, median, mode, Quantile-range
and Five number summary can be given.
2. Discrete: Discrete data have finite values it can be
numerical and can also be in categorical form. These
attributes has finite or countably infinite set of values.
UNIT-I (DMWH6EM) 20
Example
aaaaa