Data Mining1 1
Data Mining1 1
Data mining refers to extracting or mining knowledge from the large amount of data like mining the gold from the rocks and sand. So data mining should have more appropriate name knowledge mining from data Mining is process that finds a small set of data from a great deal of raw material. Some other terms that are used for the data mining are knowledge mining from the database, knowledge extraction, data/pattern analysis , data dredging. It is also sometimes referred as KDD (knowledge discovery in database).
Data integration
- here multiple data source may be combined
Data selection
- Here data relevant to the analysis task are retrieved from the database
Data transformation
- Where data transformed or consolidate into forms appropriate for mining by performing summary or aggregation operations.
Data mining
- An essential process where intelligent methods are applied in order to extract data patterns
Pattern evaluation - To identify the truly interesting patterns representing knowledge based on
some interestingness measures.
Knowledge presentation
-Where visualization and knowledge representation techniques re used to present the mined knowledge
Database
Data warehouse
Concept/class description: Characterisation and description: Data can be associated with classes or concepts. It can be useful to describe individual classes and concepts in summarised, concise, and yet precise terms. Such description of a class or a concept are called class/concept descriptions. These descriptions can be viewed via 1)data characterisation, by summarising the data of the class under study in general terms or (2) data discrimination, by comparison of the target class with one or a set of comparative classes. Association Analysis: It is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. Association analysis is widely used for transaction analysis.\ classification and prediction: Classification is the process of finding a set of models ( or) functions that describe and distinguish data classes and concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. Classification can be used for predicting the class label of data objects. cluster analysis: It analyze data objects without consulting a known class label. Outlier Analysis: Outliers are data objects of a database that do not comply with the general behavior or model of data. Outlier analysis has wide application. It can be used in fraud detection, for example, by detecting unusual usage of credit cards or telecommunication services. Evolution analysis: Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time.
Background knowledge: Users can specify background knowledge, or knowledge about the domain to be mined. This knowledge is useful for guiding the knowledge discovery process, and for evaluating the patterns found. There are several kinds of background knowledge. For example concept hierarchies, user beliefs regarding relationships in the data etc. Interestingness measures: These functions are used to separate uninteresting patterns from knowledge. They may be used to guide the mining process, or after discovery, to evaluate the discovered patterns. Different kinds of knowledge may have different interestingness measures. For example, interestingness measures for association rules include support (the percentage of task-relevant data tuples for which the rule pattern appears), and confidence (the strength of the implication of the rule). Rules whose support and confidence values are below user-specified thresholds are considered uninteresting Presentation and visualization of discovered patterns: This refers to the form in which discovered patterns are to be displayed. Users can choose from different forms for knowledge presentation, such as rules, tables, charts, graphs, decision trees, and cubes.