0% found this document useful (0 votes)
2 views

Mining Kind of data

Data mining can be applied to various forms of data, including relational databases, data warehouses, and multimedia data. Relational databases are structured collections of tables that can be accessed using SQL, while data warehouses integrate information from multiple sources to support decision-making. Key components of a data warehouse include a central database, ETL tools, metadata, and access tools, and they provide a multidimensional view of data through structures like data cubes.

Uploaded by

virat18kohli360
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Mining Kind of data

Data mining can be applied to various forms of data, including relational databases, data warehouses, and multimedia data. Relational databases are structured collections of tables that can be accessed using SQL, while data warehouses integrate information from multiple sources to support decision-making. Key components of a data warehouse include a central database, ETL tools, metadata, and access tools, and they provide a multidimensional view of data through structures like data cubes.

Uploaded by

virat18kohli360
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

WHAT KINDS OF DATA CAN

BE MINED?
 As a general technology, data mining can be applied to
any kind of data as long as the data are meaningful for
a target application.
 The most basic forms of data for mining applications

are relational databases, object-relational


databases and object-oriented databases, data
warehouse data, and transactional data.
 Data mining can also be applied to other forms of data

e.g., data streams, ordered/sequence data, graph or


networked data, spatial data, text data, multimedia
data, and the WWW.
Relational Database
 A Relational database is defined as the collection of
data organized in tables with rows and columns.
 Briefly, a relational database is a collection of tables,

each of which is assigned a unique name.


 Each tables consists columns and rows, where

columns represent attributes and rows or records


represent tuples.
 Each tuple in a relational table represents an object

identified by a unique key and described by a set of


attribute values.
Relational Database
Database Schema
 A database schema is the skeleton structure that represents
the logical view of the entire database.
 It defines how the data is organized and how the relations
among them are associated.
 It formulates all the constraints that are to be applied on the
data.
 A database schema defines its entities and the relationship
among them.
 It contains a descriptive detail of the database.
Example of Relational schema for a
relational database

 customer (cust ID, name, address, age, occupation,


annual income, credit information, category, . . .)
 item (item ID, brand, category, type, price, place made,
supplier, cost, . . .)
 employee (empl ID, name, category, group, salary,
commission, . . .)
Categories of Database schema

 A database schema can be divided broadly into


two categories such as,
 Physical schema in Relational databases is a
schema which defines the structure of tables.
 Logical schema in Relational databases is a
schema which defines the relationship among
tables.
E-R diagram

A semantic data model, such as an entity-


relationship (ER) data model, is often
constructed for relational databases.
 An ER data model represents the database
as a set of entities and their relationships.
SQL
 Relational data can be accessed by database queries
written in a relational query language called SQL.
 Standard API of relational database is SQL.
 The most commonly used query language for relational
database is SQL, which allows retrieval and manipulation
of the data stored in the tables, as well as the calculation
of aggregate functions such as average, sum, min, max
and count.
 For instance, an SQL query to select the videos grouped
by category would be:
SELECT count(*) FROM Items WHERE type=video
GROUP BY category.
 Data mining algorithms using relational databases can
be more versatile, since they can take advantage of
the structure inherent to relational databases.
 While data mining can benefit from SQL for data
selection, transformation and consolidation, it goes
beyond what SQL could provide, such as predicting,
comparing, detecting deviations, etc.
 Application: Data Mining, ROLAP model, etc.
 Relational databases are one of the most commonly
available and richest information repositories, and thus
they are a major data form in the study of data mining.
Data warehouse

 A data warehouse is a repository of information


collected from multiple sources, stored under a unified
schema, and usually residing at a single site.
 Data warehouse is an integrated subject-oriented
and time-variant repository of information in support

of management’s decision-making process.


 Data warehouses are constructed via a process of data
cleaning, data integration, data transformation, data
loading, and periodic data refreshing.
 In other words, data from the different stores would be
loaded, cleaned, transformed and integrated together.
 To facilitate decision-making and multi-dimensional
views, data warehouses are usually modelled by a multi-
dimensional data structure.
Characteristics of Data warehouse

 Subject-Oriented: A data warehouse is subject-oriented


since it provides topic-wise information rather than the
overall processes of a business.
 Integrated: A data warehouse is developed by
integrating data from varied sources into a consistent
format.
 Time-variant: The different data present in the data
warehouse provides information for a specific period.
 Non-volatile: Data once entered into a data warehouse
must remain unchanged. All data is read-only. Previous
data is not erased when current data is entered.
Structure of data warehouse system
Three types of Data warehouse

 Enterprise data warehouse (EDW) is a system


for structuring and storing all company's
business data for analytics querying and
reporting.
 Data Mart is a smaller form of data warehouse,

which serves some specific needs on data


analysis.
 A virtual warehouse is a set of views over an

operational database for efficient query


processing.
DW Architecture
Key components of a data warehouse
 A typical data warehouse has four main components are
 Central database
 ETL (extract, transform, load) tools
 Metadata
 Access tools
 It involves extracting information from source system by using
an ETL process and then storing the information in a staging
database. The daily changes also come to the staging area.
 Another ETL process is used to transform information from
the staging area to populate the ODS.
 Then ODS is used for supplying information via another ETL
process to the data warehouse which in turn feeds a number
of data marts that generate the reports required by
management.
 The data in a data warehouse are organized
around major subjects to facilitate decision
making.
 The data are stored to provide information from
a historical perspective and are typically
summarized.
Data Cube

 A data warehouse is usually modelled by a


multidimensional data structure, called a data
cube, in which each dimension corresponds to
an attribute or a set of attributes in the schema,
and each cell stores the value of some
aggregate measure such as count or sum.
 A data cube provides a multidimensional view of

data and allows the precomputation and fast


access of summarized data.
Example: A cube represented by,

Country x Degree x Semester


Data Cube Operations
 Roll-up: zooming out on the data cube
 Drill-down: zooming in on the data and is therefore the
reverse of roll-up.
 Slice and dice: Slice and dice are operations for browsing
the data in the cube.
 A slice is a subset of the cube corresponding to a single
value.
 A dice is obtained by performing a selection on two or more
dimensions.
 Pivot: The pivot operation is used when the user wishes to
re-orient the view of the data cube.
 It may involve swapping the rows and columns
Benefits of data warehouse

 Provides a single version of truth about


enterprise information.
 Speed up ad hoc reports and queries

 Improved data consistency

 Better business decisions

 Easier access to enterprise data for end-users

 Better documentation of data

 Reduced computer costs and higher productivity

You might also like