0% found this document useful (0 votes)
18 views

Data Mining and Warehousing

Data mining and warehousing lecture note

Uploaded by

bklodo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Data Mining and Warehousing

Data mining and warehousing lecture note

Uploaded by

bklodo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

DCIT 322

DATABASE MANAGEMENT AND


ADMINISTRATION

SESSION 1 – INTRODUCTION

Course Writer: Michael K. Kolugu, Dept. of Computer Sc.


Contact Information: [email protected]

College of Basic and Applied Sciences


School of Physical and Mathematical Sciences
Department of Computer Science
Session Overview
• A quick introduction to databases and its evolution
over the years.

Slide 2
Session Outline
The key topics to be covered in the session are as follows:
• What is a database?
• Database Management Systems
• Properties of a database and Data concepts
• Databases evolution
• Different types of database
• Relational Database Model
• DBA responsibilities
• Oracle RDBMS and Tools
Slide 3
Database
• Database
– is defined as collection of related data that is organised and
stored
– and enables us to produce information
• Data can be defined as:
– recorded facts and numbers
• Information can be defined as:
– Knowledge derived from data
– Data presented in a meaningful context
– Data processed by summing, ordering, averaging, grouping,
comparing, or other similar operations
Slide 4
Database
• Database technology is currently almost
ubiquitously employed in every IT system
such as data mining, web search, academic
databases
• You may not notice it, but databases are
behind almost everything you do on the
Web.
– Give examples??

Slide 5
DBMS
• The DBMS (Database Management System) is a
software that is used to Create, Manage and
Control Access to databases.

• The DBMS is a large, complicated program that is


licensed from a software vendor.

• The DBMS receives requests encoded in SQL and


translates those requests into actions on the
database. Slide 6
DBMS: Merits and Demerits
 Better information
 Faster response time
 Lower operating costs
 Lower storage requirements
 Improved data integrity
 Better data management

o Higher software cost


o Increase vulnerability
Slide 7
Database Evolution
• The data organization has evolved from a
collection of independent flat files to a collection
of related objects and now NoSQL.
– Flat Files (1940). E.g .csv, .txt, .tsv, binary files
– Hierarchical and Network databases. E.g. IMS, XML, Win Registry
– Relational model (1978: Oracle, DB2)
– Object-Oriented (Oracle ODBMS and others)
– Multi-Dimensional
– Open source DBMS (MySQL and others)
– Hybrid (XML/relational, etc.)
– The NoSQLmovement (2009)
Slide 8
Object-Oriented Database Model
• Database capabilities are combined with object-oriented
programming language capabilities
• OODBMS can store and retrieve complex unstructured data, such
as documents, digital photographs, video and audio clips etc.
• Data is stored in objects, which contain data along with methods
(actions)
• Objects can be retrieved using object query language (OQL) –an
object-oriented version of SQL
• The ability to directly manipulate data stored in OOD using an
object-oriented programming language is called transparent
persistence
• For more info go to www.odbms.org
Slide 9
Hybrid DBMS: XML and Relational

Hybrid DBMS.
Source: IBM
Slide 10
NoSQL
• Next Generation Databases mostly are non-relational,
distributed, open-source and horizontally scalable.
• They can be schema-free, they store huge amount of
unstructured data.
• Often based on XML (the open-source dbXML).
• NoSQL movement are finding wide acceptance in such
applications as Facebook and Twitter.
• Both Facebook and Twitter use Apache Software
Foundation’s Cassandra database.
• "nosql" -> "not only sql".
Slide 11
Database Types
• Single-user or Multi-user database systems
• Client-Server or Multi-tier database systems
• Centralised or Distributed database systems
• Disk-based or In-memory or hybrid databases
• Personal database systems or Enterprise-class
• database systems.
• Transactional or Data warehousing database
• systems
Slide 12
Relational Database Concept
• IBM engineer, E. F. Codd in his 1970 paper
proposed the relational model for database
systems which has become the basis for RDBMS
• The relational model consists of the following:
– Collection of tables or relations
– Set of operators to act on the relations
– Data integrity for accuracy and consistency
• Relational databases come with relational
operators that produce new relations from old
Slide 13
Relational Model
• A relational database is a collection of relations or two-
dimensional tables.
• Entity is something (such a person or object) of
importance to the business or organisation to which the
database belongs
• Characteristics of entities are called attributes
• A relationship describes an association between two or
more entities
• There are three basic entity relationships:
– One-to-One, One-to-Many, Many-to-Many
Slide 14
Data and metadata
• A database is a self-describing collection of
integrated tables.
– Integrated tables are tables that store both data and
the relationship among the data.
– self-describing because it contains a description of
itself.
– Thus, databases contain not only tables of user data,
but also tables of data that describe that user data.
– Such descriptive data is called metadata because it is
data about data.
– Collection of metadata is often called
Slide 15 data dictionary
Database
• Structured Query Language (SQL) is the American
National Standards Institute (ANSI) standard
language for operating relational databases
• SQL provides statements for a variety of tasks,
including:
– Querying data
– Inserting, updating, and deleting rows in a table
– Creating, replacing, altering, and dropping objects
– Controlling access to the database and its objects
– Guaranteeing database consistency Slide 16
and integrity
What is SQL

Slide 17
Properties of a database
• The term database usually implies series of
related properties:
– Data abstraction
– Data sharing
– Data definition
– Data integrity
– Data security
– Data independence
– Data concurrency
– Data consistency Slide 18
Data abstraction and Data Sharing
• Data abstraction
– A database can be viewed as a model of reality.
– The information stored in a database is usually an attempt to
represent the properties of some objects in the real world.
• For example, an academic database is meant to record relevant details of
university activity

• Data Sharing
– Data stored in a database is not usually held solely for the use of
one person.
– A database is normally expected to be accessible by more than
one person, perhaps at the same time.
• For example a student database might be accessible by members of not
only academic but also administrative staff.
Slide 19
Data Definition and Data Integrity
• Data definition
– involves describing the properties of the data that go into each
database table.
– Each column has
– Name(must be unique within the table).
– Data type(such as Number, Date/Time, Text).
– Properties(such as size, format , any allowable range etc.)
– Description(an optional description of the data).
• Data integrity
– means that data in a database adheres to specified business rules,
– refers to maintaining and assuring the accuracy of data over its
entire life-cycle. Slide 20
Data Security and Data Privacy
• Data security refers to protecting data against
destruction and misuse –both intentional and
accidental
• It involves protecting database access by users
– usernames, passwords, user privileges
• And protecting against data loss
– backup and disaster-recovery
• A company that stores data about individuals is
responsible for protecting the privacy of that data
Slide 21
Data Independence
• One of the main requirements of the database
system is the idea of buffering data from the
processes that use such data.
• Data separated from all programs that make use
of the data.
• The data remains
– accessible
– stable
– and cannot be corrupted by accessing applications.
Slide 22
Data Independence

Slide 23
concurrency and consistency
• Data concurrency
– ensures that multiple users can access data at the same time

• Data consistency
– ensures that each user sees a consistent view of the data,
– including visible changes made by the user's own transactions
and committed transactions of other users

Slide 24
Database domains

Slide 25
Database Administrator
• Database administrator is a person responsible for the design,
implementation, maintenance and repair of an organization's
database.
• The role includes the development and design of database
strategies, monitoring and improving database performance
and capacity, and planning for future expansion requirements.
• They may also plan, co-ordinate and implement security
measures to safeguard the database.
Slide 26
Database Giants
• Oracle
– Oracle 12c Database
– MySQL

• Microsoft
– SQL Server 2012

• IBM
– DB2

Slide 27
Oracle
• Oracle Database is a powerful and robust DBMS that runs on
many different operating systems, including Windows 7,
Windows Server 2008 R2, several variations of UNIX, and Linux.
• It is a very popular DBMS, and it has a long history of
development and use.
• Oracle Database exposes much of its technology to the
developer; consequently, it can be tuned and tailored in many
ways.
Slide 28
Oracle

Slide 29
Oracle Tools
• Query Tools
– SQL*Plus (command line) , SQL Developer (GUI), Discoverer (Reporting tool)

• Developer Tools
– SQL Developer, Forms, Reports, JDeveloper

• Moving Data Tools


– Data Pump, SQL Loader

• Administration Tools
– Database Configuration Assistant, Oracle Net Manager, Oracle Enterprise
Manager, Recovery Manager Slide 30
Questions

Slide 31

You might also like