Notes Chapter 1.1 Lecture 1.4(Referential Data Structure, Schema, Instances and Keys)
Notes Chapter 1.1 Lecture 1.4(Referential Data Structure, Schema, Instances and Keys)
CHAPTER 1.1
Relational data model is the primary data model, which is used widely around the world for
data storage and processing. This model is simple and it has all the properties and capabilities
required to process data with storage efficiency.
Concepts
1. Tables − In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
2. Tuple − A single row of a table, which contains a single record for that relation is
called a tuple.
3. Relation instance − A finite set of tuples in the relational database system represents
relation instance. Relation instances do not have duplicate tuples.
4. Relation schema − A relation schema describes the relation name (table name),
attributes, and their names.
5. Relation key − Each row has one or more attributes, known as relation key, which
can identify the row in the relation (table) uniquely.
6. Attribute domain − Every attribute has some pre-defined value scope, known as
attribute domain.
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are
associated. It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a
descriptive detail of the database, which can be depicted by means of schema diagrams. It’s
the database designers who design the schema to help programmers understand the database
and make it useful.
Physical Database Schema − This schema pertains to the actual storage of data and
its form of storage like files, indices, etc. It defines how the data will be stored in a
secondary storage.
Logical Database Schema − This schema defines all the logical constraints that need
to be applied on the data stored. It defines tables, views, and integrity constraints.
Definition of schema: Design of a database is called the schema. Schema is of three types:
Physical schema, logical schema and view schema.
For example: In the following diagram, we have a schema that shows the relationship
between three tables: Course, Student and Section. The diagram only shows the design of the
database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.
The design of a database at physical level is called physical schema, how the data stored in
blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data
records gets stored in data structures, however the internal details such as implementation of
data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the
skeleton of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain
any data or information.
A database instance is a state of operational database with data at any given time. It contains
a snapshot of the database. Database instances tend to change with time. A DBMS ensures
that its every instance (state) is in a valid state, by diligently following all the validations,
constraints, and conditions that the database designers have imposed.
Definition of instance: The data stored in database at a particular moment of time is called
instance of database. Database schema defines the variable declarations in tables that belong
to a particular database; the value of these variables at a moment of time is called the instance
of that database.
For example, lets say we have a single table student in the database, today the table has 100
records, so today the instance of the database has 100 records. Lets say we are going to add
another 100 records in this table by tomorrow so the instance of database tomorrow will have
200 records in table. In short, at a particular moment the data stored in database is called the
instance, that changes over time when we add or delete data from the database.
Keys
Key plays an important role in relational database; it is used for identifying unique rows from
table. It also establishes relationship among tables.
Definition: A primary key is a minimal set of attributes (columns) in a table that uniquely
identifies tuples (rows) in that table.
Lets take an example to understand the concept of primary key. In the following table, there
are three attributes: Stu_ID, Stu_Name & Stu_Age. Out of these three attributes, one attribute
or a set of more than one attributes can be a primary key.
Attribute Stu_Name alone cannot be a primary key as more than one students can
have same name.
Attribute Stu_Age alone cannot be a primary key as more than one students can have
same age.
Attribute Stu_Id alone is a primary key as each student has a unique id that can
identify the student record in the table.
Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that
case we try to find a set of attributes that can uniquely identify a row in table. We will see the
example of it after this example.
101 Steve 23
102 John 24
103 Robert 28
104 Steve 29
105 Carl 29
The attribute(s) that is marked as primary key is not allowed to have null values.
Primary keys are not necessarily to be a single attribute (column). It can be a set of
more than one attributes (columns). For example {Stu_Id, Stu_Name} collectively
can identify the tuple in the above table, but we do not choose it as primary key
because Stu_Id alone is enough to uniquely identifies rows in a table and we always
go for minimal set. Having that said, we should choose more than one columns as
primary key only when there is no single column that can uniquely identify the tuple
in table.
In the above example, we already had a table with data and we were trying to understand the
purpose and meaning of primary key, however you should know that generally we define the
primary key during table creation. We can define the primary key later as well but that rarely
happens in the real world scenario.
Lets say we want to create the table that we have discussed above with the customer id and
product id set working as primary key. We can do that in SQL like this:
Definition of Super Key in DBMS: A super key is a set of one or more attributes (columns),
which can uniquely identify a row in a table. Often some students are confused between super
key and candidate key, so we will also discuss candidate key and its relation with super key
in this.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only thing we
take care while selecting candidate key is: It should not have any redundant attribute. That’s
the reason they are also termed as minimal super key.
Table: Employee
Super keys: The above table has following super keys. All of the following sets of super key
are able to uniquely identify a row of the employee table.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys: As I mentioned in the beginning, a candidate key is a minimal super key
with no redundant attributes. The following two set of super keys are chosen from the above
sets as there are no redundant attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes that
are not necessary for unique identification.
I have been getting lot of comments regarding the confusion between super key and candidate
key. Let me give you a clear explanation.
1. First you have to understand that all the candidate keys are super keys. This is because the
candidate keys are chosen out of the super keys.
2. How we choose candidate keys from the set of super keys? We look for those keys from
which we cannot remove any fields. In the above example, we have not chosen
{Emp_SSN, Emp_Name} as candidate key because {Emp_SSN} alone can identify a
unique row in the table and Emp_Name is redundant.
Primary Key:
A Primary key is selected from a set of candidate keys. This is done by database admin or
database designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a
primary key for the table Employee.
Definition of Candidate Key in DBMS: A Super Key with no redundant attribute is known
as candidate key. Candidate keys are selected from the set of super keys, the only thing we
take care while selecting candidate key is that the candidate key should not have any
redundant attributes. That’s the reason they are also termed as minimal super key.
Lets take an example of table “Employee”. This table has three attributes: Emp_Id,
Emp_Number & Emp_Name. Here Emp_Id & Emp_Number will be having unique values
and Emp_Name can have duplicate values as more than one employees can have same name.
1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}
6. {Emp_Number, Emp_Name}
Lets select the candidate keys from the above set of super keys.
{Emp_Id}
{Emp_Number}
Note: A primary key is selected from the set of candidate keys. That means we can either
have Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database
administrator)
Foreign key in DBMS
Definition: Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.
For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id
C01 101
C02 102
C03 101
C05 102
C06 103
C07 102
Student table:
101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with the primary key tag of another table,
if it points to a unique column (not necessarily a primary key) of another table then too, it
would be a foreign key. So, a correct definition of foreign key would be: Foreign keys are the
columns of a table that points to the candidate key of another table.
Composite key in DBMS
Definition of Composite key: A key that has more than one attributes is known as composite
key. It is also known as compound key.
Note: Any key such as super key, primary key, candidate key etc. can be called composite
key if it has more than one attributes.
Lets consider a table Sales. This table has four columns (attributes) – cust_Id, order_Id,
product_code & product_count.
Table – Sales
None of these columns alone can play a role of key in this table.
Column cust_Id alone cannot become a key as a same customer can place multiple orders,
thus the same customer can have multiple entires.
Column order_Id alone cannot be a primary key as a same order can contain the order of
multiple products, thus same order_Id can be present multiple times.
Column product_code cannot be a primary key as more than one customers can place order
for the same product.
Column product_count alone cannot be a primary key because two orders can be placed for
the same product count.
Based on this, it is safe to assume that the key should be having more than one attributes:
As we have seen in the candidate key guide that a table can have multiple candidate keys.
Among these candidate keys, only one key gets selected as primary key, the remaining keys
are known as alternative or secondary keys.
Lets take an example to understand the alternate key concept. Here we have a table
Employee, this table has three attributes: Emp_Id, Emp_Number & Emp_Name.
Table: Employee/strong>
{Emp_Id}
{Emp_Number}
DBA (Database administrator) can choose any of the above key as primary key. Lets say
Emp_Id is chosen as primary key.
Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be
called alternative or secondary key.
OTHER REFRENCES