Data Mining and Warehousing
Data Mining and Warehousing
SESSION 1 – INTRODUCTION
Slide 2
Session Outline
The key topics to be covered in the session are as follows:
• What is a database?
• Database Management Systems
• Properties of a database and Data concepts
• Databases evolution
• Different types of database
• Relational Database Model
• DBA responsibilities
• Oracle RDBMS and Tools
Slide 3
Database
• Database
– is defined as collection of related data that is organised and
stored
– and enables us to produce information
• Data can be defined as:
– recorded facts and numbers
• Information can be defined as:
– Knowledge derived from data
– Data presented in a meaningful context
– Data processed by summing, ordering, averaging, grouping,
comparing, or other similar operations
Slide 4
Database
• Database technology is currently almost
ubiquitously employed in every IT system
such as data mining, web search, academic
databases
• You may not notice it, but databases are
behind almost everything you do on the
Web.
– Give examples??
Slide 5
DBMS
• The DBMS (Database Management System) is a
software that is used to Create, Manage and
Control Access to databases.
Hybrid DBMS.
Source: IBM
Slide 10
NoSQL
• Next Generation Databases mostly are non-relational,
distributed, open-source and horizontally scalable.
• They can be schema-free, they store huge amount of
unstructured data.
• Often based on XML (the open-source dbXML).
• NoSQL movement are finding wide acceptance in such
applications as Facebook and Twitter.
• Both Facebook and Twitter use Apache Software
Foundation’s Cassandra database.
• "nosql" -> "not only sql".
Slide 11
Database Types
• Single-user or Multi-user database systems
• Client-Server or Multi-tier database systems
• Centralised or Distributed database systems
• Disk-based or In-memory or hybrid databases
• Personal database systems or Enterprise-class
• database systems.
• Transactional or Data warehousing database
• systems
Slide 12
Relational Database Concept
• IBM engineer, E. F. Codd in his 1970 paper
proposed the relational model for database
systems which has become the basis for RDBMS
• The relational model consists of the following:
– Collection of tables or relations
– Set of operators to act on the relations
– Data integrity for accuracy and consistency
• Relational databases come with relational
operators that produce new relations from old
Slide 13
Relational Model
• A relational database is a collection of relations or two-
dimensional tables.
• Entity is something (such a person or object) of
importance to the business or organisation to which the
database belongs
• Characteristics of entities are called attributes
• A relationship describes an association between two or
more entities
• There are three basic entity relationships:
– One-to-One, One-to-Many, Many-to-Many
Slide 14
Data and metadata
• A database is a self-describing collection of
integrated tables.
– Integrated tables are tables that store both data and
the relationship among the data.
– self-describing because it contains a description of
itself.
– Thus, databases contain not only tables of user data,
but also tables of data that describe that user data.
– Such descriptive data is called metadata because it is
data about data.
– Collection of metadata is often called
Slide 15 data dictionary
Database
• Structured Query Language (SQL) is the American
National Standards Institute (ANSI) standard
language for operating relational databases
• SQL provides statements for a variety of tasks,
including:
– Querying data
– Inserting, updating, and deleting rows in a table
– Creating, replacing, altering, and dropping objects
– Controlling access to the database and its objects
– Guaranteeing database consistency Slide 16
and integrity
What is SQL
Slide 17
Properties of a database
• The term database usually implies series of
related properties:
– Data abstraction
– Data sharing
– Data definition
– Data integrity
– Data security
– Data independence
– Data concurrency
– Data consistency Slide 18
Data abstraction and Data Sharing
• Data abstraction
– A database can be viewed as a model of reality.
– The information stored in a database is usually an attempt to
represent the properties of some objects in the real world.
• For example, an academic database is meant to record relevant details of
university activity
• Data Sharing
– Data stored in a database is not usually held solely for the use of
one person.
– A database is normally expected to be accessible by more than
one person, perhaps at the same time.
• For example a student database might be accessible by members of not
only academic but also administrative staff.
Slide 19
Data Definition and Data Integrity
• Data definition
– involves describing the properties of the data that go into each
database table.
– Each column has
– Name(must be unique within the table).
– Data type(such as Number, Date/Time, Text).
– Properties(such as size, format , any allowable range etc.)
– Description(an optional description of the data).
• Data integrity
– means that data in a database adheres to specified business rules,
– refers to maintaining and assuring the accuracy of data over its
entire life-cycle. Slide 20
Data Security and Data Privacy
• Data security refers to protecting data against
destruction and misuse –both intentional and
accidental
• It involves protecting database access by users
– usernames, passwords, user privileges
• And protecting against data loss
– backup and disaster-recovery
• A company that stores data about individuals is
responsible for protecting the privacy of that data
Slide 21
Data Independence
• One of the main requirements of the database
system is the idea of buffering data from the
processes that use such data.
• Data separated from all programs that make use
of the data.
• The data remains
– accessible
– stable
– and cannot be corrupted by accessing applications.
Slide 22
Data Independence
Slide 23
concurrency and consistency
• Data concurrency
– ensures that multiple users can access data at the same time
• Data consistency
– ensures that each user sees a consistent view of the data,
– including visible changes made by the user's own transactions
and committed transactions of other users
Slide 24
Database domains
Slide 25
Database Administrator
• Database administrator is a person responsible for the design,
implementation, maintenance and repair of an organization's
database.
• The role includes the development and design of database
strategies, monitoring and improving database performance
and capacity, and planning for future expansion requirements.
• They may also plan, co-ordinate and implement security
measures to safeguard the database.
Slide 26
Database Giants
• Oracle
– Oracle 12c Database
– MySQL
• Microsoft
– SQL Server 2012
• IBM
– DB2
Slide 27
Oracle
• Oracle Database is a powerful and robust DBMS that runs on
many different operating systems, including Windows 7,
Windows Server 2008 R2, several variations of UNIX, and Linux.
• It is a very popular DBMS, and it has a long history of
development and use.
• Oracle Database exposes much of its technology to the
developer; consequently, it can be tuned and tailored in many
ways.
Slide 28
Oracle
Slide 29
Oracle Tools
• Query Tools
– SQL*Plus (command line) , SQL Developer (GUI), Discoverer (Reporting tool)
• Developer Tools
– SQL Developer, Forms, Reports, JDeveloper
• Administration Tools
– Database Configuration Assistant, Oracle Net Manager, Oracle Enterprise
Manager, Recovery Manager Slide 30
Questions
Slide 31