Data mining can be applied to various forms of data, including relational databases, data warehouses, and multimedia data. Relational databases are structured collections of tables that can be accessed using SQL, while data warehouses integrate information from multiple sources to support decision-making. Key components of a data warehouse include a central database, ETL tools, metadata, and access tools, and they provide a multidimensional view of data through structures like data cubes.
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2 views
Mining Kind of data
Data mining can be applied to various forms of data, including relational databases, data warehouses, and multimedia data. Relational databases are structured collections of tables that can be accessed using SQL, while data warehouses integrate information from multiple sources to support decision-making. Key components of a data warehouse include a central database, ETL tools, metadata, and access tools, and they provide a multidimensional view of data through structures like data cubes.
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24
WHAT KINDS OF DATA CAN
BE MINED? As a general technology, data mining can be applied to any kind of data as long as the data are meaningful for a target application. The most basic forms of data for mining applications
are relational databases, object-relational
databases and object-oriented databases, data warehouse data, and transactional data. Data mining can also be applied to other forms of data
e.g., data streams, ordered/sequence data, graph or
networked data, spatial data, text data, multimedia data, and the WWW. Relational Database A Relational database is defined as the collection of data organized in tables with rows and columns. Briefly, a relational database is a collection of tables,
each of which is assigned a unique name.
Each tables consists columns and rows, where
columns represent attributes and rows or records
represent tuples. Each tuple in a relational table represents an object
identified by a unique key and described by a set of
attribute values. Relational Database Database Schema A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data. A database schema defines its entities and the relationship among them. It contains a descriptive detail of the database. Example of Relational schema for a relational database
two categories such as, Physical schema in Relational databases is a schema which defines the structure of tables. Logical schema in Relational databases is a schema which defines the relationship among tables. E-R diagram
A semantic data model, such as an entity-
relationship (ER) data model, is often constructed for relational databases. An ER data model represents the database as a set of entities and their relationships. SQL Relational data can be accessed by database queries written in a relational query language called SQL. Standard API of relational database is SQL. The most commonly used query language for relational database is SQL, which allows retrieval and manipulation of the data stored in the tables, as well as the calculation of aggregate functions such as average, sum, min, max and count. For instance, an SQL query to select the videos grouped by category would be: SELECT count(*) FROM Items WHERE type=video GROUP BY category. Data mining algorithms using relational databases can be more versatile, since they can take advantage of the structure inherent to relational databases. While data mining can benefit from SQL for data selection, transformation and consolidation, it goes beyond what SQL could provide, such as predicting, comparing, detecting deviations, etc. Application: Data Mining, ROLAP model, etc. Relational databases are one of the most commonly available and richest information repositories, and thus they are a major data form in the study of data mining. Data warehouse
A data warehouse is a repository of information
collected from multiple sources, stored under a unified schema, and usually residing at a single site. Data warehouse is an integrated subject-oriented and time-variant repository of information in support
of management’s decision-making process.
Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. In other words, data from the different stores would be loaded, cleaned, transformed and integrated together. To facilitate decision-making and multi-dimensional views, data warehouses are usually modelled by a multi- dimensional data structure. Characteristics of Data warehouse
Subject-Oriented: A data warehouse is subject-oriented
since it provides topic-wise information rather than the overall processes of a business. Integrated: A data warehouse is developed by integrating data from varied sources into a consistent format. Time-variant: The different data present in the data warehouse provides information for a specific period. Non-volatile: Data once entered into a data warehouse must remain unchanged. All data is read-only. Previous data is not erased when current data is entered. Structure of data warehouse system Three types of Data warehouse
Enterprise data warehouse (EDW) is a system
for structuring and storing all company's business data for analytics querying and reporting. Data Mart is a smaller form of data warehouse,
which serves some specific needs on data
analysis. A virtual warehouse is a set of views over an
operational database for efficient query
processing. DW Architecture Key components of a data warehouse A typical data warehouse has four main components are Central database ETL (extract, transform, load) tools Metadata Access tools It involves extracting information from source system by using an ETL process and then storing the information in a staging database. The daily changes also come to the staging area. Another ETL process is used to transform information from the staging area to populate the ODS. Then ODS is used for supplying information via another ETL process to the data warehouse which in turn feeds a number of data marts that generate the reports required by management. The data in a data warehouse are organized around major subjects to facilitate decision making. The data are stored to provide information from a historical perspective and are typically summarized. Data Cube
A data warehouse is usually modelled by a
multidimensional data structure, called a data cube, in which each dimension corresponds to an attribute or a set of attributes in the schema, and each cell stores the value of some aggregate measure such as count or sum. A data cube provides a multidimensional view of
data and allows the precomputation and fast
access of summarized data. Example: A cube represented by,
Country x Degree x Semester
Data Cube Operations Roll-up: zooming out on the data cube Drill-down: zooming in on the data and is therefore the reverse of roll-up. Slice and dice: Slice and dice are operations for browsing the data in the cube. A slice is a subset of the cube corresponding to a single value. A dice is obtained by performing a selection on two or more dimensions. Pivot: The pivot operation is used when the user wishes to re-orient the view of the data cube. It may involve swapping the rows and columns Benefits of data warehouse
Provides a single version of truth about
enterprise information. Speed up ad hoc reports and queries