05 Database Management Systems
05 Database Management Systems
DBMS
Application
program
End-user
Problems
• Data redundancy and inconsistency
• Multiple file formats, duplication of information in different files
• Difficulty in accessing data
• Need to write a new program to carry out each new task
• Data isolation — multiple files and formats
• Time-consuming reporting processes
• Outdated data management technology
Solution
• In an age of nonorganic corporate growth where companies grow by
acquiring other companies, business firms quickly become a
collection of hundreds of databases, e-mail systems, personnel
systems, accounting systems, and manufacturing systems, none of
which can communicate with one another. Even if firms grow
organically without acquisitions it is common for separate
departments and divisions to have their own systems and databases.
Firms in this case suffer the same result: the firm becomes a
collection of systems that cannot share information.
• Replace disparate systems with enterprise system and data
management system
Basics
• Data: Known facts that can be recorded and have an implicit meaning (Information).
• Database
• is collection of related data and its metadata organized in a structured format for optimized
information management
File-based
Hierarchical
Object-oriented
Network
Relational Web-based
Entity-Relationship
Manual File System
• To keep track of data
• Used tagged file folders in a filing cabinet
• Organized according to expected use
• e.g. file per customer
• Easy to create, but hard to
• locate data
• aggregate/summarize data
Computerized File System
• To accommodate the data growth and information need
• Manual file system structures were duplicated in the
computer
• Data Processing (DP) specialists wrote customized programs
to
• write, delete, update data (i.e. management)
• extract and present data in various formats (i.e. report)
File System
Database System vs. File System
Entity Relationship Model
• E-R Model can be expressed as the collection of entities, also called as real word
objects and relations between those entities.
• No two entities should be identical.
• Based on Entity, Attributes & Relationships
• Entity is a thing about which data are to be collected and stored
• e.g. EMPLOYEE
• Attributes are characteristics of the entity
• e.g. SSN, last name, first name
• Relationships describe an associations between entities
• i.e. 1:M, M:N, 1:1
Relationships
• Connect two or more entity sets.
• Represented by diamonds.
• Relationships
• represented by an active
or passive verb inside the
diamond that connects
the related entities.
Relational Database
Provides a logical “human-level” view of the data and
associations among groups of data (i.e., tables)
• Disadvantages
• Substantial hardware and system software overhead
• more complex system
• Poor design and implementation is made easy
• ease-of-use allows careless use of RDBMS
EXAMPLE OF AN SQL QUERY
SQL statements for a query to select suppliers for parts 137 or 150.
MICROSOFT ACCESS DATA DICTIONARY
FEATURES
Microsoft Access has a
rudimentary data dictionary
capability that displays
information about the size,
format, and other
characteristics of each field
in a database. Displayed here
is the information
maintained in the SUPPLIER
table. The small key icon to
the left of Supplier_Number
indicates that it is a key field.
Designing Databases
Conceptual (logical)
design: abstract model
from business perspective
Physical design: How
database is arranged on
direct-access storage
devices
AN UNNORMALIZED RELATION FOR ORDER
•Normalization
–Streamlining complex groupings of data to
minimize redundant data elements
NORMALIZED TABLES CREATED FROM ORDER
The Order table has been broken down into four smaller, related tables.
Order table contains only two unique attributes, Order Number and Order Date.
The multiple items ordered are stored using the Line_Item table.
The normalization means that very little data has to be duplicated when creating orders, most of
the information can be retrieved by using keys to the Part and Supplier tables.
Big data
• Massive sets of unstructured/semi-
structured data from Web traffic,
social media, sensors, and so on
• Petabytes, exabytes of data
• Volumes too great for typical DBMS
• Volume is increasing exponentially.
• Variety (Complexity)
• Velocity, need to be processed fast
A Single View to the Customer
Social Banking
Media Finance
Our
Gaming
Customer Known
History
Purchas
Entertain
e
Big Data needs speed
• Velocity refers to the frequency of incoming data that must be
processed. Think text messages, Facebook status updates, credit card
swipes, the multitude of sensors in modern cars, and the stock
exchange.
• Late decisions missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what you like send
promotions right now for store next to you
• Healthcare monitoring: sensors monitoring your activities and body any abnormal
measurements require immediate reaction
Big Data Opportunities
Business intelligence infrastructure
• Contemporary tools:
• Data warehouses
• Data marts
• Hadoop
• In-memory computing
Data warehouses
Problem: Heterogeneous Information Sources leading to:
Different interfaces
Different data representations
Duplicate and inconsistent information
World
Scientific Databases
Wide
Web
Digital Libraries
Data warehouses
Solution: Unified Access to Data that:
Collects and combines information
Provides integrated view, uniform user interface
Supports sharing
Integration System
World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases
Data marts
• Subset of data warehouse
• Summarized or focused portion of data for use by specific
population of users
• Typically focuses on single subject or line of business
Hadoop
• Software platform that lets one easily write and run applications that
process vast amounts of data. It includes:
– MapReduce – distributes application
– HDFS – Hadoop distributed file system: distributes data
– Hbase – online data access
• Open-source framework that was created to make it easier to work
with big data.
• Hadoop also is often used interchangeably with big data, but it
shouldn’t be. Hadoop is a framework for working with big data. It is
part of the big data ecosystem.
Hadoop
In-memory computing
• Used in big data analysis
• Uses computers main memory (RAM) for data storage to avoid delays
in retrieving data from disk storage
• Can reduce hours/days of processing to seconds
• Storage is done on dedicated servers.
Data Mining
1. Collect Big Data or obtain access to a repository.
2. Perform data analysis to explore patterns (pattern recognition, predictive
analytics).
3. Identify potential correlations.
4. Infers rules to predict future behavior
• Types of information obtainable from data mining:
• Associations
• Sequences
• Classification
• Clustering
• Forecasting
Text mining and Web Mining
• Text Mining:
• Extracts key elements from large unstructured data sets
• Mines e-mails, blogs, social media to detect opinions
• Web Mining:
• Discovery and analysis of useful patterns and information from Web
• Understand customer behavior
• Evaluate effectiveness of Web site, and so on
• Web content mining
• Mines content of Web pages
• Web structure mining
• Analyzes links to and from Web page
• Web usage mining
• Mines user interaction data recorded by Web server
Databases and the Web
Many companies use Web to make some internal databases available to customers
or partners
• Advantages of using Web for database access:
• Ease of use of browser software
• Web interface requires few or no changes to database
• Inexpensive to add Web interface to system
Questions from Business