0% found this document useful (0 votes)
26 views

Student Notes

The document discusses distributed data systems and access control. It covers several topics: 1) Database security and access control, including discretionary access control and multilevel access control. 2) Discretionary access control defines access rights based on users, access types (e.g. SELECT), and protected objects. Authorization is specified as a triple of subject, operation, and object definition. 3) Multilevel access control further improves security by defining security levels for subjects and data objects to restrict access based on two rules: subjects can only read objects at equal or lower levels, and can only write to equal or higher levels.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Student Notes

The document discusses distributed data systems and access control. It covers several topics: 1) Database security and access control, including discretionary access control and multilevel access control. 2) Discretionary access control defines access rights based on users, access types (e.g. SELECT), and protected objects. Authorization is specified as a triple of subject, operation, and object definition. 3) Multilevel access control further improves security by defining security levels for subjects and data objects to restrict access based on two rules: subjects can only read objects at equal or lower levels, and can only write to equal or higher levels.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DISTRIBUTED DATA SYSTEMS - SSZG554

Student Notes

DISTRIBUTED DATA SYSTEMS - SSZG554


Student Notes

Module – 4: Data and Access Control

Module Structure
Data and Access Control
 Database Security
 Discretionary Access Control
 Multilevel Access Control
 Distributed Access Control
 View Management
 Views in Centralized DBMSs
 Views in Distributed DBMSs
 Maintenance of Materialized Views

Data Security
• Data security is an important function of a database system that protects data against
unauthorized access.
• Data security includes two aspects:
– Data protection and
– Access control.

Data Protection
• Data protection is required to prevent unauthorized users from understanding the physical
content of data.
• This function is typically provided by file systems in the context of centralized and
distributed operating systems.
• Data Protection
• The main data protection approach is data encryption which is useful both for information
stored on disk and for information exchanged on a network.
• Encrypted (encoded) data can be decrypted (decoded) only by authorized users who “know”
the code.
Access Control
• Access control must guarantee that only authorized users perform operations they are
allowed to perform on the database.
• Many different users may have access to a large collection of data under the control of a
single centralized or distributed system.
• The centralized or distributed DBMS must thus be able to restrict the access of a subset of
the database to a subset of the users
• Access control in database systems differs in several aspects from that in traditional file
systems.
• Authorizations must be refined so that different users have different rights on the same
database objects.

1
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• This requirement implies the ability to specify subsets of objects more precisely than by
name and to distinguish between groups of users.
• In addition, the decentralized control of authorizations is of particular importance in a
distributed context.
• In relational systems, authorizations can be uniformly controlled by database administrators
using high-level constructs
• There are two main approaches to database access control the first approach is called
discretionary and has long been provided by DBMS.
• Discretionary access control (or authorization control) defines access rights based on the
users, the type of access (e.g., SELECT, UPDATE) and the objects to be accessed
• The second approach, called mandatory or multilevel further increases security by restricting
access to classified data to cleared users.
• Support of multilevel access control by major DBMSs is more recent and stems from
increased security threats coming from the Internet

Discretionary Access Control


• Three main actors are involved in discretionary access control control :
– The subject (e.g., users, groups of users) who trigger the execution of application
programs;
– The operations, which are embedded in application programs and
– The database objects, on which the operations are performed
• An authorization can be viewed as a triple (subject, operation type, object definition) which
specifies that the subjects has the right to perform an operation of operation type on an
object
• The introduction of a subject in the system is typically done by a pair (user name, password).
• The user name uniquely identifies the users of that name in the system, while the password,
known only to the users of that name, authenticates the users.
• The objects to protect are subsets of the database
• In a relational system, objects can be defined by their type (view, relation, tuple, attribute)
as well as by their content using selection predicates.
• The view mechanism permits the protection of objects simply by hiding subsets of relations
(attributes or tuples) from unauthorized users
• A right expresses a relationship between a subject and an object for a particular set of
operations.
• In an SQL-based relational DBMS, an operation is a high-level statement such as SELECT,
INSERT, UPDATE, or DELETE, and rights are defined (granted or revoked) using the following
statements:

• Authorization control can be characterized based on who (the grantors) can grant the rights.

2
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• In its simplest form, the control is centralized: a single user or user class, the database
administrators, has all privileges on the database objects and is the only one allowed to use
the GRANT and REVOKE statements.
• A more flexible but complex form of control is decentralized the creator of an object
becomes its owner and is granted all privileges on it.
• In particular, there is the additional operation type GRANT, which transfers all the rights of
the grantor performing the statement to the specified subjects.
• Therefore, the person receiving the right (the grantee) may subsequently grant privileges on
that object.
• The main difficulty with this approach is that the revoking process must be recursive.
• For example, if A, who granted B who granted C the GRANT privilege on object O, A wants to
revoke all the privileges of B on O, all the privileges of C on O must also be revoked.
• To perform revocation, the system must maintain a hierarchy of grants per object where the
creator of the object is the root.
• The privileges of the subjects over objects are recorded in the catalog (directory) as
authorization rules.
• The most convenient approach is to consider all the privileges as an authorization matrix, in
which a row defines a subject, a column an object, and a matrix entry for a pair (subject,
object), the authorized operations

• The authorization matrix can be stored in three ways:


– by row,
– by column, or
– by element.
• When the matrix is stored by row, each subject is associated with the list of objects that may
be accessed together with the related access rights.
• When the matrix is stored by column, each object is associated with the list of subjects who
may access it with the corresponding access rights.
• The matrix is stored by element, that is, by relation (subject, object, right).
• This relation can have indices on both subject and object, thereby providing fast-access right
manipulation per subject and per object

Multilevel Access Control


• Discretionary access control has some limitations.
• One problem is that a malicious user can access unauthorized data through an authorized
user.

3
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• Consider user A who has authorized access to relations R and S and user B who has
authorized access to relation S only.
• If B somehow manages to modify an application program used by A so it writes R data into
S , then B can read unauthorized data without violating authorization rules.
• Multilevel access control further improves security by defining different security levels for
both subjects and data objects
• The security levels are Top Secret (TS ), Secret (S ), Confidential (C ) and Unclassified (U ), and
ordered as TS > S >C >U , where “> ” means “more secure”.
Access in read and write modes by subjects is restricted by two simple rules:
1. A subject S is allowed to read an object of security level l only if level(S) > l .
2. A subject S is allowed to write an object of security level l only if class(S) <= l .
• Rule 1 called “no read up” protects data from unauthorized disclosure, i.e., a subject at a
given security level can only read objects at the same or lower security levels.
• For instance, a subject with secret clearance cannot read top-secret data.
• Rule 2 (called “no write down”) protects data from unauthorized change, i.e., a subject at a
given security level can only write objects at the same or higher security levels.
• For instance, a subject with top-secret clearance can only write top-secret data but cannot
write secret data
• In the relational model, data objects can be relations, tuples or attributes.
• Thus, a relation can be classified at different levels:
– Relation - all tuples in the relation have the same security level
– Tuple - every tuple has a security level
– Attribute - every distinct attribute value has a security level
• A classified relation is thus called multilevel relation to reflect that it will appear differently
(with different data) to subjects with different clearances.
• A multilevel relation classified at the tuple level can be represented by adding a security
level attribute to each tuple.
• Similarly, a multilevel relation classified at attribute level can be represented by adding a
corresponding security level to each attribute.
• A multilevel relation PROJ* based on relation PROJ which is classified at the attribute level.
• The entire relation also has a security level which is the lowest security level of any data it
contains

4
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• For instance, relation PROJ* has security level C .


• A relation can then be accessed by any subject having a security level which is the same or
higher.
• However, a subject can only access data for which it has clearance.
• Thus, attributes for which a subject has no clearance will appear to the subject as null values
with an associated security level which is the same as the subject.
• An instance of relation PROJ* as accessed by a subject at a confidential security level.

Distributed Access Control


• The additional problems of access control in a distributed environment stem from the fact
that objects and subjects are distributed and that messages with sensitive data can be read
by unauthorized users.
• These problems are:
– Remote user authentication
– Management of discretionary access rules, handling of views and of user groups
– Enforcing multilevel access control
• Remote user authentication is necessary since any site of a distributed DBMS may accept
programs initiated, and authorized, at remote sites.
• To prevent remote access by unauthorized users or applications users must also be
identified and authenticated at the accessed site.
• Furthermore, instead of using passwords that could be obtained from sniffing messages,
encrypted certificates could be used.
• Three solutions are possible for managing authentication
1. Authentication information is maintained at a central site for global users which
can then be authenticated only once and then accessed from multiple sites
2. The information for authenticating users (user name and password) is replicated
at all sites in the catalog. Local programs, initiated at a remote site, must also
indicate the user name and password
3. All sites of the distributed DBMS identify and authenticate themselves similar to
the way users do.
Inter site communication is thus protected by the use of the site password. Once the
initiating site has been authenticated, there is no need for authenticating their
remote users.
• Handling user groups for the purpose of authorization simplifies distributed database
administration.
• In a centralized DBMS, “all users” can be referred to as public .
• In a distributed DBMS, the same notion is useful, the public denoting all the users of the
system.
• However an intermediate level is often introduced to specify the public at a particular site,
denoted by public@site s
• The management of groups in a distributed environment poses some problems since the
subjects of a group can be located at various sites and access to an object may be granted to
several groups, which are themselves distributed.

5
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• If group information as well as access rules are fully replicated at all sites the enforcement
of access rights is similar to that of a centralized system.

View Management
• One of the main advantages of the relational model is that it provides full logical data
independence.
• External schemas enable user groups to have their particular view of the database.
• In a relational system, a view is a virtual relation, defined as the result of a query on base
relations (or real relations), but not materialized like a base relation, which is stored in the
database.
• A view is a dynamic window in the sense that it reflects all updates to the database.
• An external schema can be defined as a set of views and/or base relations.
• Besides their use in external schemas, views are useful for ensuring data security in a simple
way.
• By selecting a subset of the database, views hide some data.
• If users may only access the database through views, they cannot see or manipulate the
hidden data, which are therefore secure.
• In a distributed DBMS, a view can be derived from distributed relations, and the access to a
view requires the execution of the distributed query corresponding to the view definition.
• An important issue in a distributed DBMS is to make view materialization efficient

Views in Centralized DBMS


• Most relational DBMSs use a view mechanism where a view is a relation derived from base
relations as the result of a relational query
• It is defined by associating the name of the view with the retrieval query that specifies it.
The view of system analysts (SYSAN) derived from relation EMP (ENO,ENAME,TITLE), can be
defined by the following SQL query:

• The single effect of this statement is the storage of the view definition in the catalog.
• No other information needs to be recorded.
• Therefore, the result of the query defining the view is not produced.
• However, the view SYSAN can be manipulated as a base relation
• Find the names of all the system analysts with their project number and responsibility
involving the view SYSAN and relation ASG(ENO,PNO,RESP,DUR) can be expressed as

6
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• Mapping a query expressed on views into a query expressed on base relations can be done
by query modification
• With this technique the variables are changed to range on base relations and the query
qualification is merged with the view qualification
• Views in Centralized DBMS
The preceding query can be modified to

• The modified query is expressed on base relations and can therefore be processed by the
query processor.
• It is important to note that view processing can be done at compile time.
• The view mechanism can also be used for refining the access controls to include subsets of
objects.
• To specify any user from whom one wants to hide data, the keyword USER generally refers
to the logged-on user identifier
The view ESAME restricts the access by any user to those employees having the same title:

• If the user who creates ESAME is an electrical engineer, as in this case, the view represents
the set of all electrical engineers
• Views can be defined using arbitrarily complex relational queries involving selection,
projection, join, aggregate functions, and so on.
• All views can be interrogated as base relations, but not all views can be manipulated as such.

7
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• Updates through views can be handled automatically only if they can be propagated
correctly to the base relations.
• We can classify views as being updatable and not updatable.
• A view is updatable only if the updates to the view can be propagated to the base relations
without ambiguity.
• The view SYSAN above is updatable; the insertion, for example, of a new system analyst
h201, Smithi will be mapped into the insertion of a new employee (201, Smith, Syst. Anal.).
• If attributes other than TITLE were hidden by the view, they would be assigned null values.
The following view, however, is not updatable

• The deletion, for example, of the tuple hSmith, Analysti cannot be propagated, since it is
ambiguous.
• Deletions of Smith in relation EMP or analyst in relation ASG are both meaningful, but the
system does not know which is correct.
• Current systems are very restrictive about supporting updates through views.
• Views can be updated only if they are derived from a single relation by selection and
projection.
• This precludes views defined by joins, aggregates, and so on
• It is interesting to note that views derived by join are updatable if they include the keys of
the base relations.

Views in Distributed DBMSs


• A view in a distributed system may be derived from fragmented relations stored at different
sites.
• When a view is defined, its name and its retrieval query are stored in the catalog.
• Since views may be used as base relations by application programs, their definition should
be stored in the directory in the same way as the base relation descriptions.
• Depending on the degree of site autonomy offered by the system view definitions can be
centralized at one site, partially duplicated, or fully duplicated.
• In any case, the information associating a view name to its definition site should be
duplicated.
• If the view definition is not present at the site where the query is issued, remote access to
the view definition site is necessary.
• The mapping of a query expressed on views into a query expressed on base relations (which
can potentially be fragmented) can also be done in the same way as in centralized systems,
that is, through query modification.
• With this technique, the qualification defining the view is found in the distributed database
catalog and then merged with the query to provide a query on base relations.
• Such a modified query is a distributed query, which can be processed by the distributed
query processor

8
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• The query processor maps the distributed query into a query on physical fragments.
• Evaluating views derived from distributed relations may be costly.
• In a given organization it is likely that many users access the same view which must be
recomputed for each user.
• An alternative solution is to avoid view derivation by maintaining actual versions of the
views, called materialized views.
• A materialized view stores the tuples of a view in a database relation, like the other
database tuples, possibly with indices.
• Thus, access to a materialized view is much faster than deriving the view, in particular, in a
distributed DBMS where base relations can be remote.

Maintenance of Materialized Views


• A materialized view is a copy of some base data and thus must be kept consistent with that
base data which may be updated.
• View maintenance is the process of updating (or refreshing) a materialized view to reflect
the changes made to the base data.
• Major difference is that materialized view expressions, in particular, for data warehousing,
are typically more complex than replica definitions and may include join, group by and
aggregate operators.
• Another major difference is that database replication is concerned with more general
replication configurations, e.g., with multiple copies of the same base data at multiple sites.
• View maintenance policy allows a DBA to specify when and how a view should be refreshed.
• The first question (when to refresh) is related to consistency (between the view and the base
data) and efficiency.
• A view can be refreshed in two modes: immediate or deferred.
• With the immediate mode, a view is refreshed immediately as part as the transaction that
updates base data used by the view.
• If the view and the base data are managed by different DBMSs, possibly at different sites,
this requires the use of a distributed transaction, for instance, using the two-phase commit
(2PC) protocol
• The main advantages of immediate refreshment are that the view is always consistent with
the base data and that read-only queries can be fast.
• However, this is at the expense of increased transaction time to update both the base data
and the views within the same transactions.
• Furthermore, using distributed transactions may be difficult.
• In practice, the deferred mode is preferred because the view is refreshed in separate
(refresh) transactions, thus without performance penalty on the transactions that update
the base data.
• The refresh transactions can be triggered at different times: lazily, just before a query is
evaluated on the view; periodically, at predefined times, e.g., every day; or forcedly, after a
predefined number of updates to the base data.
• Lazy refreshment enables queries to see the latest consistent state of the base data but at
the expense of increased query time to include the refreshment of the view.
• Periodic and forced refreshment allow queries to see views whose state is not consistent
with the latest state of the base data.

9
DISTRIBUTED DATA SYSTEMS - SSZG554
Student Notes

• The views managed with these strategies are also called snapshots
• The second question (how to refresh a view) is an important efficiency issue.
• The simplest way to refresh a view is to recompute it from scratch using the base data.
• In some cases, this may be the most efficient strategy, e.g., if a large subset of the base data
has been changed.
• However, there are many cases where only a small subset of view needs to be changed.
• In these cases, a better strategy is to compute the view incrementally, by computing only the
changes to the view.

10

You might also like