0% found this document useful (0 votes)
30 views

Data Mining1 1

Data Mining refers to extracting or'mining3 knowledge from the large amount of data like mining the gold from the rocks and sand. Some other terms that are used for the Data Mining are knowledge mining from the database, knowledge extraction, data / pattern analysis, data dredging. As a knowledge discovery, Data Mining contains the following steps 1. 2. 3. 4. 5. 6. 7. Data cleaning - to remove the noise and inconsistent data data integration - here multiple data source may be combined data selection - here data

Uploaded by

Shruti Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Data Mining1 1

Data Mining refers to extracting or'mining3 knowledge from the large amount of data like mining the gold from the rocks and sand. Some other terms that are used for the Data Mining are knowledge mining from the database, knowledge extraction, data / pattern analysis, data dredging. As a knowledge discovery, Data Mining contains the following steps 1. 2. 3. 4. 5. 6. 7. Data cleaning - to remove the noise and inconsistent data data integration - here multiple data source may be combined data selection - here data

Uploaded by

Shruti Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Mining

Data mining refers to extracting or mining knowledge from the large amount of data like mining the gold from the rocks and sand. So data mining should have more appropriate name knowledge mining from data Mining is process that finds a small set of data from a great deal of raw material. Some other terms that are used for the data mining are knowledge mining from the database, knowledge extraction, data/pattern analysis , data dredging. It is also sometimes referred as KDD (knowledge discovery in database).

As a Knowledge Discovery, data mining contains the following steps


1. 2. 3. 4. 5. 6. 7. Data cleaning
- To remove the noise and inconsistent data

Data integration
- here multiple data source may be combined

Data selection
- Here data relevant to the analysis task are retrieved from the database

Data transformation
- Where data transformed or consolidate into forms appropriate for mining by performing summary or aggregation operations.

Data mining
- An essential process where intelligent methods are applied in order to extract data patterns

Pattern evaluation - To identify the truly interesting patterns representing knowledge based on
some interestingness measures.

Knowledge presentation
-Where visualization and knowledge representation techniques re used to present the mined knowledge

Data Mining as a step in the process of knowledge discovery

Components of the Data Mining


1. Data base or data warehouse or other information repository: This is one or a set of databases, data warehouses, spreadsheet, or other kind of information repositories where data cleaning and data integration techniques may be performed. 2. Database or data warehouse servers: The database or data warehouse server is responsible for fetching the relevant data, based on the users data mining request. 3. Knowledge base: This is domain knowledge that is used to guide the search, or evaluate the interestingness of result patterns 4. Data mining engine: This is essential to the data mining system and ideally consists of a set of functional modules for tasks such as characterisation, classification, cluster analysis, and evolution and deviation analysis. 5. Pattern evaluation module: This component typically employs interestingness measures and interact with the data mining modules so as to focus the search towards interesting patterns. 6. Graphical user interface: This modules communicates between user and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on intermediate data mining results.

Graphical user interface

Pattern evaluation Knowledge Base

Data mining engine

Database or data warehouse server Data cleaning Data Integration Filtering

Database

Data warehouse

Architecture of a typical data mining system

Data mining Functionalities What kinds of patterns can be mined?


Concept/class description: Characterisation and discrimination Association analysis Classification and prediction Cluster analysis Outlier analysis Evolution analysis

Concept/class description: Characterisation and description: Data can be associated with classes or concepts. It can be useful to describe individual classes and concepts in summarised, concise, and yet precise terms. Such description of a class or a concept are called class/concept descriptions. These descriptions can be viewed via 1)data characterisation, by summarising the data of the class under study in general terms or (2) data discrimination, by comparison of the target class with one or a set of comparative classes. Association Analysis: It is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. Association analysis is widely used for transaction analysis.\ classification and prediction: Classification is the process of finding a set of models ( or) functions that describe and distinguish data classes and concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. Classification can be used for predicting the class label of data objects. cluster analysis: It analyze data objects without consulting a known class label. Outlier Analysis: Outliers are data objects of a database that do not comply with the general behavior or model of data. Outlier analysis has wide application. It can be used in fraud detection, for example, by detecting unusual usage of credit cards or telecommunication services. Evolution analysis: Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time.

Data mining primitives


Each end user will have a task in mind that is some form of data of data analysis that she would like to have performed A data mining task can be specified by in the form of data mining query. Which is input to the data mining system

A data mining query is defined in terms of following primitives


Task-relevant data: This is the database portion to be investigated. For example if a person is the in-charge of the sales for a region then he need to study only buying habits of the customer of that region rather then the entire country. The kinds of knowledge to be mined: This species the data mining functions to be performed, such as characterization, discrimination, association, classification, clustering, or evolution analysis. For instance, if studying the buying habits of customers in Canada, you may choose to mine associations between customer profiles and the items that these customers like to buy

Background knowledge: Users can specify background knowledge, or knowledge about the domain to be mined. This knowledge is useful for guiding the knowledge discovery process, and for evaluating the patterns found. There are several kinds of background knowledge. For example concept hierarchies, user beliefs regarding relationships in the data etc. Interestingness measures: These functions are used to separate uninteresting patterns from knowledge. They may be used to guide the mining process, or after discovery, to evaluate the discovered patterns. Different kinds of knowledge may have different interestingness measures. For example, interestingness measures for association rules include support (the percentage of task-relevant data tuples for which the rule pattern appears), and confidence (the strength of the implication of the rule). Rules whose support and confidence values are below user-specified thresholds are considered uninteresting Presentation and visualization of discovered patterns: This refers to the form in which discovered patterns are to be displayed. Users can choose from different forms for knowledge presentation, such as rules, tables, charts, graphs, decision trees, and cubes.

You might also like