Lecture 1: Part I: Emerging Database Technology, Research and Applications
Lecture 1: Part I: Emerging Database Technology, Research and Applications
Emerging Applications
Each domain imposes different requirements on the DBMS in terms of storage, access methods, knowledge discovery and data analysis.
WWW less than 10% of the WWW is mapped by current search engines, also how do you represent semi-structured data? Engineering & Scientific Applications complex interrelationships need to be mapped Telecommunication streaming data needs to be handled Medicine large heterogeneous (genomic, MRI, ultrasound) datasets, fast sequence & association matching & access required Multimedia Different forms of data need to be represented and accessed interactively (real-time response)
Key Challenges
Complex Objects & Relationships Multiple data types
Numbers, Text, Images, Audio, Video
Real time response for high volume data User Expectations on the rise
Better Interfaces/Visualizations Going beyond raw data to meaningful information Knowledge Discovery and Data Mining Extraction, Cleaning, Mining, Result Interpretation
Functionality Trends
Metadata, parallel processing
Future
More variety of Applications and Data A great need for domain experts who can also understand DB modeling & concepts More demands on performance
scaling to larger databases staying within decent response times
TOWARD
Multilingual Multiplatform Multiprocessor Intelligent Interoperable Adaptive
Database and Knowledge Base Systems
Lecture 1: Part II
Relational Data Model
Relational Model
Relational Data Structures Relational Integrity Rules Relational Algebra
e.g. 10 9 99
Relations
Relation schema R or R(A1, A2, , An) fixed set of attributes A1, A2, , An each attribute Aj corresponds to exactly one of the underlying domains Dj (j = 1, 2, , n), not necessarily distinct Dj = domain( Aj) EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION) note: EMPNO and MGR take values from the same domain, the set of valid employee numbers N, the number of attributes in the relation scheme is called the degree of the relation. Degree 1 unary degree 3 ternary degree 2 binary degree n n-ary
Relation (or relation instance) r of relation schema R(A1, A2, , An) or r(R)
Alternative Definitions (1):
******1a ******
vj an element of
Dj = domain(Aj)
Relation (or relation instance) r of relation schema R(A1, A2, , An) or r(R)
Alternative definition 2a: Set of n-tuples (or tuples, where each tuple consists of attribute-value pairs (Aj, vj) (j = 1, 2, , n), one such pair for each attribute Aj in the relation schema. For any given attribute-value pair (Aj, vj) , vj is a value from the unique domain Dj that is associated with that attribute.
Relation (or relation instance) r of relation schema R(A1, A2, , An) or r(R)
Alternative definition 2b: Finite set of mappings r = (t1, t2, , tm) each tuple t is a mapping from
R = R(A1, A2, , An) to D = domain(A1) domain(A2) domain(An)
Keys
A relation is a set of tuples. All elements of a set are distinct. Hence all tuples must be distinct. There may also be a subset of the attributes with the property that values must be distinct. Such a set is called a superkey.
SK a set of attributes t1 and t2 tuples t1 [SK] t2 [SK] Guaranteed by the real world.
Example
EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION) Some superkeys: EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION EMPNO, NAME, DNO, JOB EMPNO, NAME EMPNO, NAME, SAL, COMMISSION EMPNO NAME ???
Example
Relation
EMP
Candidate Key
EMPNO NAME ?? DNO DNAME LOC ??
DEPT
Primary Keys: EMPNO, DNO Note: The only way to identify a unique tuple is by its primary key.
Properties of Relations
Tuples are unordered. There are no tuple duplicates. Attributes are unordered (Definition 2). All attribute values are atomic.
10
11