The document discusses database system concepts and architecture. It covers topics such as data models, schemas and instances, three-schema architecture, data independence, DBMS languages and interfaces. It provides details on different components of a database system and the relationships between them.
The document discusses database system concepts and architecture. It covers topics such as data models, schemas and instances, three-schema architecture, data independence, DBMS languages and interfaces. It provides details on different components of a database system and the relationships between them.
Outline ■ Data Models and Their Categories ■ History of Data Models ■ Schemas, Instances, and States ■ Three-Schema Architecture ■ Data Independence ■ DBMS Languages and Interfaces ■ Database System Utilities and Tools ■ Centralized and Client-Server Architectures ■ Classification of DBMSs
Data Models ■ Data Model: ■ A set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey. ■ Data Model Structure and Constraints: ■ Constructs are used to define the database structure ■ Constructs typically include elements (and their data types) as well as groups of elements (e.g. entity, record, table), and relationships among such groups ■ Constraints specify some restrictions on valid data; these constraints must be enforced at all times
Data Models (continued) ■ Data Model Operations: ■ These operations are used for specifying database retrievals and updates by referring to the constructs of the data model. ■ Operations on the data model may include basic model operations (e.g. generic insert, delete, update) and user-defined operations (e.g. compute_student_gpa, update_inventory)
Categories of Data Models ■ Conceptual (high-level, semantic) data models: ■ Provide concepts that are close to the way many users perceive data. ■ (Also called entity-based or object-based data models.) ■ Physical (low-level, internal) data models: ■ Provide concepts that describe details of how data is stored in the computer. These are usually specified in an ad-hoc manner through DBMS design and administration manuals ■ Implementation (representational) data models: ■ Provide concepts that fall between the above two, used by many commercial DBMS implementations (e.g. relational data models used in many commercial systems).
Schemas versus Instances ■ Database Schema: ■ The description of a database. ■ Includes descriptions of the database structure, data types, and the constraints on the database. ■ Schema Diagram: ■ An illustrative display of (most aspects of) a database schema. ■ Schema Construct: ■ A component of the schema or an object within the schema, e.g., STUDENT, COURSE.
Schemas versus Instances ■ Database State: ■ The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database. ■ Also called database instance (or occurrence or snapshot). ■ The term instance is also applied to individual database components, e.g. record instance, table instance, entity instance
Database Schema vs. Database State ■ Database State: ■ Refers to the content of a database at a moment in time. ■ Initial Database State: ■ Refers to the database state when it is initially loaded into the system. ■ Valid State: ■ A state that satisfies the structure and constraints of the database.
Database Schema vs. Database State (continued) ■ Distinction ■ The database schema changes very infrequently. ■ The database state changes every time the database is updated.
Three-Schema Architecture ■ Proposed to support DBMS characteristics of: ■ Program-data independence. ■ Support of multiple views of the data. ■ Not explicitly used in commercial DBMS products, but has been useful in explaining database system organization
Three-Schema Architecture ■ Defines DBMS schemas at three levels: ■ Internal schema at the internal level to describe physical storage structures and access paths (e.g indexes). ■ Typically uses a physical data model. ■ Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. ■ Uses a conceptual or an implementation data model. ■ External schemas at the external level to describe the various user views. ■ Usually uses the same data model as the conceptual schema.
Three-Schema Architecture ■ Mappings among schema levels are needed to transform requests and data. ■ Programs refer to an external schema, and are mapped by the DBMS to the internal schema for execution. ■ Data extracted from the internal DBMS level is reformatted to match the user’s external view (e.g. formatting the results of an SQL query for display in a Web page)
Data Independence ■ Logical Data Independence: ■ The capacity to change the conceptual schema without having to change the external schemas and their associated application programs. ■ Physical Data Independence: ■ The capacity to change the internal schema without having to change the conceptual schema. ■ For example, the internal schema may be changed when certain file structures are reorganized or new indexes are created to improve database performance
Data Independence (continued) ■ When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence. ■ The higher-level schemas themselves are unchanged. ■ Hence, the application programs need not be changed since they refer to the external schemas.
DBMS Languages ■ Data Definition Language (DDL) ■ Data Manipulation Language (DML) ■ High-Level or Non-procedural Languages: These include the relational language SQL ■ May be used in a standalone way or may be embedded in a programming language ■ Low Level or Procedural Languages: ■ These must be embedded in a programming language
DBMS Languages ■ Data Definition Language (DDL): ■ Used by the DBA and database designers to specify the conceptual schema of a database. ■ In many DBMSs, the DDL is also used to define internal and external schemas (views). ■ In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. ■ SDL is typically realized via DBMS commands provided to the DBA and database designers
DBMS Languages ■ Data Manipulation Language (DML): ■ Used to specify database retrievals and updates ■ DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java. ■ A library of functions can also be provided to access the DBMS from a programming language ■ Alternatively, stand-alone DML commands can be applied directly (called a query language).
Types of DML ■ High Level or Non-procedural Language: ■ For example, the SQL relational language ■ Are “set”-oriented and specify what data to retrieve rather than how to retrieve it. ■ Also called declarative languages. ■ Low Level or Procedural Language: ■ Retrieve data one record-at-a-time; ■ Constructs such as looping are needed to retrieve multiple records, along with positioning pointers.
DBMS Interfaces ■ Stand-alone query language interfaces ■ Example: Entering SQL queries at the DBMS interactive SQL interface (e.g. SQL*Plus in ORACLE) ■ Programmer interfaces for embedding DML in programming languages ■ User-friendly interfaces ■ Menu-based, forms-based, graphics-based, etc.
programming languages: ■ Embedded Approach: e.g embedded SQL (for C, C++, etc.), SQLJ (for Java) ■ Procedure Call Approach: e.g. JDBC for Java, ODBC for other programming languages ■ Database Programming Language Approach: e.g. ORACLE has PL/SQL, a programming language based on SQL; language incorporates SQL and its data types as integral components
■ Forms-based, designed for naïve users ■ Graphics-based ■ (Point and Click, Drag and Drop, etc.) ■ Natural language: requests in written English ■ Combinations of the above: ■ For example, both menus and forms used extensively in Web database interfaces
Other DBMS Interfaces ■ Speech as Input and Output ■ Web Browser as an interface ■ Parametric interfaces, e.g., bank tellers using function keys. ■ Interfaces for the DBA: ■ Creating user accounts, granting authorizations ■ Setting system parameters ■ Changing schemas or access paths
Database System Utilities ■ To perform certain functions such as: ■ Loading data stored in files into a database. Includes data conversion tools. ■ Backing up the database periodically on tape. ■ Reorganizing database file structures. ■ Report generation utilities. ■ Performance monitoring utilities. ■ Other functions, such as sorting, user monitoring, data compression, etc.
Other Tools ■ Data dictionary / repository: ■ Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc. ■ Active data dictionary is accessed by DBMS software and users/DBA. ■ Passive data dictionary is accessed by users/DBA only.
Centralized and Client-Server DBMS Architectures ■ Centralized DBMS: ■ Combines everything into single system including- DBMS software, hardware, application programs, and user interface processing software. ■ User can still connect through a remote terminal – however, all processing is done at centralized site.
Clients ■ Provide appropriate interfaces through a client software module to access and utilize the various server resources. ■ Clients may be diskless machines or PCs or Workstations with disks with only the client software installed. ■ Connected to the servers via some form of a network. ■ (LAN: local area network, wireless network, etc.)
DBMS Server ■ Provides database query and transaction services to the clients ■ Relational DBMS servers are often called SQL servers, query servers, or transaction servers ■ Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as: ■ ODBC: Open Database Connectivity standard ■ JDBC: for Java programming access ■ Client and server must install appropriate client module and server module software for ODBC or JDBC ■ See Chapter 9
Two Tier Client-Server Architecture ■ A client program may connect to several DBMSs, sometimes called the data sources. ■ In general, data sources can be files or other non-DBMS software that manages data. ■ Other variations of clients are possible: e.g., in some object DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers, etc.
Three Tier Client-Server Architecture ■ Common for Web applications ■ Intermediate Layer called Application Server or Web Server: ■ Stores the web connectivity software and the business logic part of the application used to access the corresponding data from the database server ■ Acts like a conduit for sending partially processed data between the database server and the client. ■ Three-tier Architecture Can Enhance Security: ■ Database server only accessible via middle tier ■ Clients cannot directly access database server
Classification of DBMSs ■ Based on the data model used ■ Traditional: Relational, Network, Hierarchical. ■ Emerging: Object-oriented, Object-relational. ■ Other classifications ■ Single-user (typically used with personal computers) vs. multi-user (most DBMSs). ■ Centralized (uses a single computer with one database) vs. distributed (uses multiple computers, multiple databases)
■ Advantages: ■ Since all data is stored at a single location only thus it is easier to access and coordinate data. ■ The centralized database has very minimal data redundancy since all data is stored in a single place. ■ It is cheaper in comparison to all other databases available. ■ Disadvantages: ■ The data traffic in the case of a centralized database is more. ■ If any kind of system failure occurs in the centralized system then the entire data will be destroyed.
Distributed Database ■ A distributed database is basically a type of database which consists of multiple databases that are connected with each other and are spread across different physical locations.
■ Advantages: ■ This database can be easily expanded as data is already spread across different physical locations. ■ The distributed database can easily be accessed from different networks. ■ This database is more secure in comparison to a centralized database. ■ Disadvantages: ■ This database is very costly and is difficult to maintain because of its complexity.
Homogeneous Databases ■ Homogeneous databases are characterized by all participating nodes sharing the same database management system (DBMS) and schema structure. ■ Characteristics of Homogeneous Databases ■ Uniformity: All nodes utilize the same DBMS software and possess identical database schemas. ■ Data Consistency: Changes made to the database on one node are automatically propagated to other nodes, ensuring data consistency. ■ Simplicity: Homogeneous databases are relatively easier to manage as the same DBMS software is employed throughout the system.
Heterogeneous Databases ■ Contrary to homogeneous databases, heterogeneous databases allow nodes in a distributed system to use different DBMS software or possess varying database schemas. ■ Characteristics of Heterogeneous Databases ■ Flexibility: Nodes can employ different DBMS software, such as MySQL, MongoDB Nodes can employ different DBMS software, such as MySQL, MongoDB, or Cassandra, based on their specific needs. ■ Schema Mapping: Heterogeneous databases necessitate mapping between different schemas to ensure interoperability between nodes. ■ Data Transformation: Data might need to be transformed or translated between different formats or encodings to maintain consistency.
Cost considerations for DBMSs ■ Cost Range: from free open-source systems to configurations costing millions of dollars ■ Examples of free relational DBMSs: MySQL, PostgreSQL, others ■ Commercial DBMS offer additional specialized modules, e.g. time-series module, spatial data module, document module, XML module ■ These offer additional specialized functionality when purchased separately ■ Sometimes called cartridges (e.g., in Oracle) or blades ■ Different licensing options: site license, maximum number of concurrent users (seat license), single user, etc.
History of Data Models ■ Network Model: ■ The first network DBMS was implemented by Honeywell in 1964-65 (IDS System). ■ Adopted heavily due to the support by CODASYL (Conference on Data Systems Languages) (CODASYL - DBTG report of 1971). ■ Later implemented in a large variety of systems - IDMS (Cullinet - now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P. (Hewlett-Packard)), VAX -DBMS (Digital Equipment Corp., next COMPAQ, now H.P.).
Network Model ■ Advantages: ■ Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. ■ Can handle most situations for modeling using record types and relationship types. ■ Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc. ■ Programmers can do optimal navigation through the database.
Network Model ■ Disadvantages: ■ Navigational and procedural nature of processing ■ Database contains a complex array of pointers that thread through a set of records. ■ Little scope for automated “query optimization”
History of Data Models ■ Hierarchical Data Model: ■ Initially implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. ■ IBM’s IMS product had (and still has) a very large customer base worldwide ■ Hierarchical model was formalized based on the IMS system ■ Other systems based on this model: System 2k (SAS inc.)
Hierarchical Model ■ Advantages: ■ Simple to construct and operate ■ Corresponds to a number of natural hierarchically organized domains, e.g., organization (“org”) chart ■ Language is simple: ■ Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc. ■ Disadvantages: ■ Navigational and procedural nature of processing ■ Database is visualized as a linear arrangement of records ■ Little scope for "query optimization"
History of Data Models ■ Relational Model: ■ Proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. ■ Now in several commercial products (e.g. DB2, ORACLE, MS SQL Server, SYBASE, INFORMIX). ■ Several free open source implementations, e.g. MySQL, PostgreSQL ■ Currently most dominant for developing database applications. ■ SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99, SQL3, … ■ Chapters 5 through 11 describe this model in detail
History of Data Models ■ Object-oriented Data Models: ■ Several models have been proposed for implementing in a database system. ■ One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). ■ Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB). ■ Object Database Standard: ODMG-93, ODMG-version 2.0, ODMG-version 3.0. ■ Chapters 20 and 21 describe this model.
History of Data Models ■ Object-Relational Models: ■ Most Recent Trend. Started with Informix Universal Server. ■ Relational systems incorporate concepts from object databases leading to object-relational. ■ Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server and other DBMSs. ■ Standards included in SQL-99 and expected to be enhanced in future SQL standards. ■ Chapter 22 describes this model.
Summary ■ Data Models and Their Categories ■ History of Data Models ■ Schemas, Instances, and States ■ Three-Schema Architecture ■ Data Independence ■ DBMS Languages and Interfaces ■ Database System Utilities and Tools ■ Centralized and Client-Server Architectures ■ Classification of DBMSs