comp st3 notes (1)
comp st3 notes (1)
As its name suggests a database is a base for storing data. With many advantages over the old paper
system of storing data, Databases save space(physical storage space, compared to paper records),
allow multiple people to access the same data at the same time and queries (similar to filters) can be
done to only show the data required. A database is system that allows us to store data in a structured
way using tables and fields, and gives us various means of access to to the data. What is the
difference between a data and information? Data is a collection of facts that are meaningless on their
own whereas information puts data into clear understandable context. What is the difference
between a database and a spreadsheet? Yes, when you look at a database it might look very similar
to a spreadsheet. Whilst spreadsheets may primarily be used to manipulate data using functions and
formula to perform calculations and statistics, whereas databases are primarily used to store data
and often have relationships between tables and should allow the user to easily generate queries to
view specific data. Databases are often contain much more data than a spreadsheet. What is the
difference between a database and an information system? A database may form part of the
backend of an information system. As described on wikipedia ' An Information system (IS) is a formal,
sociotechnical, organizational system designed to collect, process, store, and distribute information.
In a sociotechnical perspective, information systems are composed by four components: task,
people, structure (or roles), and technology' (https://en.wikipedia.org/wiki/Information_system) A
database will contain data that is used by information systems where-as the information system
comprises of the complete system, furthermore may present the data from the database in a way in
which it becomes information. SECTION 2 | DATABASES MANAGEMENT SYSTEMS (DBMS) Database
Management Systems refer to software used to manage the database, for a single database this
could include various software and applications the form a database management system or for a
very basic database it could simply be one piece of software such as SQLite. The DBMS should
provide an interface for data manipulation, it should
provide some form of data security and should have some build in data validation methods.
Databases are essential for managing large amounts of data efficiently and effectively. Here are some
reasons why databases are needed:
· Data organisation: Databases provide a way to organise data in a structured manner, making it
easier to store, retrieve, and manipulate data. Without a database, data would be stored in individual
files, which would make it difficult to manage and access.
· Data integrity: Databases ensure data integrity by providing mechanisms to ensure that data is
accurate and consistent. This is important when multiple users or applications need to access the
same data. Without a database, it would be difficult to maintain data consistency and accuracy.
· Data security: Databases provide a secure way to store data by allowing administrators to control
access to data. This helps protect sensitive data from unauthorised access, ensuring that only
authorised users can access the data.
· Scalability: Databases are designed to handle large amounts of data, making them a scalable
solution for organisations that need to store and manage large volumes of data.
· Performance: Databases are optimised for performance, allowing users to access and manipulate
data quickly and efficiently. This is especially important for applications that need to process large
amounts of data quickly.
· Data sharing: Databases enable data sharing among different applications and users, making it
easier for teams to collaborate and share data across different systems and applications.
Databases are essential for managing data effectively, ensuring data integrity, security, and scalability,
and optimizing performance.
Transactions, states, and updates are important concepts in database management that are used to
maintain data consistency and integrity.
· Transactions: A transaction is a logical unit of work that consists of one or more database
operations that must be executed together as a single, atomic unit. Transactions ensure that either
all of the operations are completed successfully or none of them are completed at all. This helps
maintain data consistency by ensuring that the database remains in a consistent state, even in the
event of errors or system failures.
· States: States refer to the condition of the database at any given time. Database management
systems use states to keep track of changes to the database, including additions, updates, and
deletions. The current state of the database is often referred to as the database's current state, and it
is important to maintain consistency across different states to ensure data integrity.
· Updates: Updates refer to changes made to the database, including additions, updates, and
deletions. Database management systems use update operations to make changes to the database
while maintaining data consistency and integrity. This is accomplished through the use of locking
mechanisms, which ensure that only one user can make changes to a particular record at a time. This
helps prevent conflicts and inconsistencies in the data.
By using transactions, states, and updates, database management systems can ensure data
consistency and integrity by ensuring that changes to the database are made in a controlled and
consistent manner. This helps prevent errors, conflicts, and inconsistencies in the data, which can
lead to problems with data quality and reliability.
A database transaction is a logical unit of work that involves one or more database operations, such
as insert, update, or delete. A transaction ensures that all of the operations in the unit are executed
together as a single, atomic operation, which means that either all of the operations are completed
successfully, or none of them are completed at all. In other words, a transaction is a sequence of
database operations that are executed as a single unit of work. Transactions are used to ensure data
consistency and integrity by ensuring that the database remains in a consistent state, even in the
event of errors,
system failures, or other problems. A typical transaction has four properties, commonly referred to
as ACID properties:
1. Atomicity: A transaction is atomic, which means that either all of its operations are executed
successfully or none of them are executed at all. This ensures that the database remains in a
consistent state.
2. Consistency: A transaction ensures that the database remains in a consistent state before and after
it is executed.
3. Isolation: Transactions are executed in isolation from one another, which means that the changes
made by one transaction are not visible to other transactions until they are completed.
4. Durability: Once a transaction is completed, its changes are permanently stored in the database,
even in the event of a system failure or other problem.
Database transactions are an essential concept in database management and are used to ensure
data consistency, integrity, and reliability.
Concurrency in a data sharing situation refers to the ability of multiple users or applications to access
and manipulate the same data simultaneously. In a shared data environment, concurrency can lead
to conflicts, inconsistencies, and other issues if not managed properly. In database management,
concurrency control is the process of managing concurrent access to data in order to maintain data
consistency and integrity. This involves implementing mechanisms to prevent conflicts and
inconsistencies that can arise when multiple users or applications attempt to access and manipulate
the same data simultaneously. There are several techniques for managing concurrency in a data
sharing situation, including:
· Locking: Locking involves the use of locks to control access to data. When a user or application
accesses a particular record, a lock is placed on that record, preventing other users or applications
from accessing or modifying it until the lock is released.
· Multi-version Concurrency Control (MVCC): MVCC involves creating multiple versions of a data
record to allow multiple users or applications to access and modify the same data simultaneously.
Each user or application sees a version of the data that reflects the state of the database at the time
the user or application began the transaction.
Databases require two fundamental functions to be performed on them: query functions and update
functions.
1. Query Functions: Query functions are used to retrieve data from the database. These functions
allow users or applications to search for specific data or to retrieve a subset of data that meets
certain criteria. Common query functions include SELECT statements in SQL and find() functions in
NoSQL databases. Query functions allow users to perform various types of data analysis and
reporting, such as sorting, grouping, filtering, and aggregating data. They are essential for retrieving
data from the database and for generating reports and insights.
2. Update Functions: Update functions are used to modify the data in the database. These functions
allow users or applications to add, update, or delete data in the database. Common update functions
include INSERT, UPDATE, and DELETE statements in SQL and save() and remove() functions in NoSQL
databases Update functions are essential for maintaining the accuracy and integrity of the data in the
database. They allow users to make changes to the data, such as correcting errors, updating records,
or deleting obsolete data. Update functions must be used carefully to ensure that data consistency
and integrity are maintained.
Databases require both query functions and update functions to be performed on them. Query
functions are used to retrieve data from the database and allow for data analysis and reporting.
Update functions are used to modify data in the database and ensure data accuracy and integrity.
These two functions are essential for managing data effectively in a database system.
Data validation and data verification are two important processes used to ensure the accuracy,
completeness, and consistency of data in a database system. Although the terms are sometimes used
interchangeably, they refer to different processes. Data Validation: Data validation is the process of
checking whether the data entered into a system is accurate, complete, and consistent with
predefined rules and constraints. The purpose of data validation is to ensure that the data entered
into the system is correct and can be used reliably. Data validation is typically performed when data
is first entered into the system, and it involves checking for errors, such as missing or invalid data,
incorrect data types, or data that does not conform to predefined rules and constraints. Data
validation may be performed using automated validation tools, such as regular expressions, or it may
involve manual review and correction of data. Data Verification: Data verification is the process of
checking whether the data in the database is accurate, complete, and consistent with the original
source. The purpose of data verification is to ensure that the data stored in the database is a true
representation of the original data source. Data verification is typically performed on a periodic basis,
such as during data migrations or when integrating data from multiple sources. It involves comparing
the data in the database with the original source to ensure that it is accurate and complete. Data
verification may involve manual checks, automated tools, or a combination of both. Data validation
and data verification are both essential processes for ensuring the
accuracy, completeness, and consistency of data in a database system. Data validation checks the
accuracy and completeness of data when it is first entered into the system, while data verification
checks the accuracy and completeness of data stored in the database relative to the original source.
By performing both data validation and data verification, organizations can ensure that their data is
reliable, accurate, and useful.
A database management system (DBMS) is software designed to store, manage, and retrieve data in
a structured and organised manner. The purpose of a DBMS is to provide a centralised, controlled,
and efficient environment for managing data, enabling organisations to store, access, and analyse
large amounts of data in a consistent and organised way. The key benefits of using a DBMS include:
· Data organisation and management: A DBMS helps organisations to store and manage large
amounts of data in a structured and organised manner, making it easier to find and retrieve the data
as needed.
· Data security and privacy: A DBMS provides a controlled environment for managing data, enabling
organisations to enforce data security and privacy policies and ensure that sensitive data is
protected.
· Data consistency and integrity: A DBMS helps to ensure that the data stored in the database is
accurate, consistent, and up-to-date, improving the quality of the data and supporting better
decision making.
· Data sharing and collaboration: A DBMS enables multiple users and applications to access and use
the same data, improving collaboration and data sharing across the organisation.
· Data analysis and reporting: A DBMS provides tools and functions for data analysis and reporting,
enabling organisations to gain insights into their data and make informed decisions based on that
data.
The purpose of a DBMS is to provide a centralised, controlled, and efficient environment for
managing data, enabling organisations to store, access, and analyse large amounts of data in a
consistent and organised way.
SECTION 2 | SECURITY
A database management system (DBMS) can be used to promote data security in several ways. Here
are some examples: Authentication and Access Control: A DBMS can provide authentication
mechanisms to verify the identity of users who access the system. It can also provide access control
mechanisms to restrict access to data and functions based on the user's role, privilege
level, or other criteria. This helps to prevent unauthorised access to sensitive data and functions.
· Encryption: A DBMS can support encryption mechanisms to protect data in transit and at rest.
Encryption can be used to ensure that data is transmitted securely over networks and stored securely
on disk or in memory. This helps to prevent data theft and unauthorised access to data.
· Audit Trail: A DBMS can maintain an audit trail of all activities that occur in the system. The audit
trail can record all changes to data, all login attempts, and other security-related events. This can
help to detect and investigate security breaches or other incidents.
· Backup and Recovery: A DBMS can support backup and recovery mechanisms to protect against
data loss or corruption. Backup mechanisms can be used to create copies of the database at regular
intervals, while recovery mechanisms can be used to restore the database to a previous state in the
event of a system failure, data loss, or other problems.
· Data Masking: A DBMS can support data masking techniques to protect sensitive data by replacing
it with fictitious data. This can be useful in situations where sensitive data is being used for testing,
training, or other purposes where the original data is not required.
A DBMS can be used to promote data security by providing authentication and access control
mechanisms, encryption, audit trails, backup and recovery, data masking, and other security
features. By using these features, organizations can help to protect sensitive data, prevent
unauthorised access, and ensure the integrity and availability of their data.
In database management, a schema refers to the logical structure of a database, which defines the
organization and relationships among the data elements or objects within the database. A schema
can be thought of as a blueprint or plan for the database, which specifies the types of data that can
be stored in the database, the relationships between different types of data, and the constraints or
rules that govern the data. A database schema typically consists of a set of tables, which represent
the different entities or objects within the database, along with their attributes or fields. The schema
defines the structure of each table, including the data types and constraints for each field,
as well as any relationships between tables. For example, a database schema for a customer
database might include tables for customers, orders, and products, along with fields for each table
such as customer name, order date, and product price. The schema would define the relationships
between these tables, such as the fact that each order is associated with a particular customer and
product. A schema is an important concept in database management, as it provides a logical
framework for organising and managing data within a database. By defining the schema of a
database, organisations can ensure that the data is structured and organised in a way that supports
their business needs and objectives.
The conceptual schema is a high-level representation of the database that defines the structure and
organization of the data stored in the database. It provides a consolidated view of the data across the
organization, abstracting the details of the physical storage and processing of the data. The
conceptual schema defines the entities, attributes, and relationships between the entities, providing
a semantic model of the data. It is typically used as a bridge between the business requirements and
the physical implementation of the database, providing a common understanding of the data for
both the business and technical stakeholders. The conceptual schema serves as the foundation for
the logical schema, which defines the detailed relationships and constraints between the data
entities, and the physical schema, which defines the physical storage and processing of the data. By
defining the data at the conceptual level, the conceptual schema enables organizations to maintain a
consistent and well-organised view of their data, even as the physical implementation of the
database evolves over time.
The physical schema is the lowest level of schema in a database, and it defines the physical storage
and organization of the data in the database. It represents the actual implementation of the
database, including the hardware and software components used to store and process the data. The
physical schema includes details such as the storage structures used to store the
data, the access methods used to retrieve the data, the indexes used to support data retrieval, and
the backup and recovery strategies used to protect the data. The physical schema is concerned with
the technical details of the database, such as disk storage, memory allocation, and input/output
performance. It is optimized for efficient data access and processing, taking into account factors such
as disk I/O, memory utilization, and network bandwidth. The physical schema is designed to support
the logical and conceptual schemas, which provide higher-level abstractions of the database. By
defining the physical implementation of the database, the physical schema enables organizations to
effectively manage the technical details of their databases, improving performance and ensuring
data integrity and consistency.
The logical schema is a higher-level representation of the database that defines the relationships
between the data entities and the constraints that govern the data. It provides a conceptual view of
the data, abstracting the details of the physical implementation. The logical schema defines the
entities, attributes, and relationships between the entities, providing a semantic model of the data. It
defines the relationships between the tables in the database and the constraints that ensure the
data is accurate and up-to-date. The logical schema provides a bridge between the business
requirements and the physical implementation of the database, enabling organizations to maintain a
consistent and well-organized view of their data, even as the physical implementation evolves over
time. The logical schema is used to support the design of the database and to provide a common
understanding of the data for both the business and technical stakeholders. It is optimized for data
access and processing, taking into account factors such as data integrity, data consistency, and query
performance. By defining the logical structure of the database, the logical schema helps
organizations to ensure the data stored in their databases is accurate and up-to-date, supporting
better decision making and improving the overall quality of the data.
In database management, a data dictionary (also known as a metadata repository or data catalog) is
a collection of metadata that provides information about the data in a database. The data dictionary
serves as a reference source for database administrators, developers, and users, and it provides a
standardised way to document the structure and contents of a database. The nature of the data
dictionary can vary depending on the specific database management system being used, but it
typically includes the following types of information:
· Data Element Descriptions: A data dictionary typically includes a description of each data element
or attribute used in the database, along with information such as the data type, length, and format of
the element.
· Table and Relationship Descriptions: A data dictionary may include descriptions of the tables in the
database, as well as the relationships between the tables. This information can help users
understand the structure of the database and the way data is organised within it.
· Business Rules and Constraints: A data dictionary may also include information about the business
rules and constraints that apply to the data in the database. This can include information such as
data validation rules, default values, and other constraints.
· Data Access Permissions: A data dictionary may also include information about the access
permissions that are required to view or modify data in the database. This can help to ensure that
data is accessed and used appropriately by authorized users.
· Database Management Information: A data dictionary may also include information about the
database management system itself, such as the version of the software being used, the server
configuration, and other technical details.
A data dictionary is a collection of metadata that provides a standardised way to document the
structure and contents of a database. It typically includes information about data elements, tables
and relationships, business rules and constraints, data access permissions, and other technical details
related to the database management system. By providing a centralised source of information about
the database, the data dictionary helps to ensure that data is managed effectively and used
appropriately by authorised users.
A data definition language (DDL) is a set of commands or statements used to define and manipulate
the structure of a database. A DDL is used to create and modify tables, indexes, constraints, and
other database objects, and to specify the relationships between these objects. The importance of a
DDL in implementing a data model is as follows:
· Creating Tables and Relationships: The primary function of a DDL is to create the tables and
relationships that make up a database. The DDL specifies the structure and attributes of each table,
including the data types of each field, the constraints that apply to the fields, and the relationships
between tables. By using a DDL to define these elements, developers can ensure that the data model
is accurate and consistent.
· Enforcing Data Integrity: A DDL can also be used to specify constraints that ensure the integrity of
the data in the database. For example, a DDL can specify that a certain field must be unique or that a
field cannot contain null values. These constraints help to ensure that the data in the database is
accurate and consistent.
· Facilitating Database Management: A DDL can also be used to modify the structure of a database as
needed. For example, a DDL can be used to add new tables or fields to a database, or to modify
existing fields or relationships. This allows database administrators to manage the database
effectively and make changes as needed to accommodate changing business needs.
· Supporting Data Security: A DDL can also be used to specify access permissions for different users
or groups of users. By using a DDL to define these permissions, developers can ensure that the data
in the database is accessed and used appropriately by authorised users, and that sensitive data is
protected from unauthorised access.
DDL is an essential tool in implementing a data model, as it allows developers to define and
manipulate the structure of the database, enforce data integrity, facilitate database management,
and support data security. By using a DDL effectively, organisations can ensure that their databases
are accurate, consistent, and secure, and that they meet the needs of the business.
Data modeling is a critical step in the design of a database because it allows developers to create a
blueprint of the database structure and relationships between the data elements. The importance of
data modeling in the design of a database can be explained as follows:
· Data Consistency and Accuracy: A well-designed data model ensures data consistency and accuracy.
A data model defines the rules, constraints, and relationships that govern how data is organised and
stored in the database. By ensuring that data is organised consistently and accurately, a data model
reduces the risk of data inconsistencies and errors.
· Efficiency: A data model helps to improve the efficiency of a database by reducing data redundancy
and improving data retrieval speed. A data model helps to identify and eliminate data redundancy,
ensuring that data is stored only once in the database. This reduces storage requirements and
improves data retrieval speed.
· Flexibility: A well-designed data model is flexible and can adapt to changing business needs. A data
model can be updated and modified easily to accommodate new requirements or changing business
needs.
The importance of data modeling in the design of a database cannot be overstated. A well-designed
data model ensures data consistency and accuracy, improves efficiency, flexibility, collaboration, and
maintainability. By creating a clear blueprint of the database structure and relationships, developers
can create a database that is well-organised, efficient, and flexible enough to meet changing business
needs.
· Table: A table is a collection of related data organised in rows and columns. Tables are used to store
data in a database and are often named based on the type of data they contain.
· Record: A record is a collection of data that represents a single entity in a table. A record is also
known as a row, and it typically contains information about a specific item or object, such as a
customer, order, or product.
· Field: A field is a single piece of data stored in a record. A field is also known as a column, and it
represents a specific attribute or characteristic of the entity represented by the record.
· Primary Key: A primary key is a field or combination of fields in a table that uniquely identifies each
record in the table. A primary key is used to enforce data integrity and ensure that no two records in
the table are identical.
· Secondary Key: A secondary key is a field or combination of fields in a table that is not the primary
key but can be used to access and query data in the table.
· Foreign Key: A foreign key is a field in a table that refers to the primary key of another table. A
foreign key is used to create a relationship between two tables and ensure data integrity across the
tables.
· Candidate Key: A candidate key is a field or combination of fields in a table that could be used as the
primary key but is not currently used for that purpose. A candidate key is used to ensure that no two
records in the table are identical.
· Composite Primary Key: A composite primary key is a primary key that consists of two or more
fields in a table. A composite primary key is used when a single field is not sufficient to uniquely
identify each record in the table.
· Join: A join is a database operation that combines data from two or more tables based on a related
field. A join is used to combine data from multiple tables into a single result set that can be used for
data analysis or reporting.
An inner join is a type of join operation in a database that combines data from two or more tables
based on a common field. An inner join returns only the rows from each table that have matching
values in the specified field, excluding any rows that do not have matching values. Here is an example
to illustrate how an inner join works:
Suppose you have two tables, a "Customers" table and an "Orders" table. The "Customers" table
contains information about each customer, such as their name and address, while the "Orders" table
contains information about each order, such as the order number and the customer who placed the
order. Both tables have a common field, such as a customer ID. To perform an inner join between
these two tables, you would specify the customer ID field as the common field. The inner join would
then return only the rows from each table where there is a matching customer ID, and exclude any
rows where there is no matching customer ID. For example, suppose the "Customers" table has a
row with a customer ID of 123 and a name of "John Smith", and the "Orders" table has a row with an
order number of 456 and a customer ID of 123. When you perform an inner join between these
tables, the result set would contain only the row with customer ID 123, and exclude any other rows
where there is no matching customer ID. In summary, an inner join is a type of join operation that
combines data from two or more tables based on a common field, returning only the rows that have
matching values in the specified field. Inner joins are commonly used in database management to
combine data from multiple tables into a single result set for data analysis or reporting.
Redundant data refers to data that is unnecessarily duplicated or repeated in a database. Redundant
data can cause a number of issues, including:
· Data Inconsistency: When data is stored redundantly, it is possible for different copies of the same
data to become inconsistent.
· Data Integrity: Redundant data can also compromise data integrity by making it more difficult to
maintain the accuracy and completeness of the data. When data is stored redundantly, it is more
difficult to ensure that all copies of the data are updated consistently and accurately.
· Storage Costs: Redundant data can also be costly in terms of storage space. When data is duplicated
unnecessarily, it takes up more space in the database, which can increase storage costs and reduce
system performance.
· Maintenance Costs: Redundant data can also increase the cost of maintaining and updating the
database. When data is stored redundantly, it requires additional effort to keep all copies of the data
up to date and accurate.
· Security Risks: Redundant data can also pose security risks by increasing the number of potential
attack points for malicious actors. If redundant data is not properly secured, it can be more easily
accessed and manipulated by unauthorised users.
Redundant data can cause a number of issues for a database, including data inconsistency,
compromised data integrity, increased storage and maintenance costs, and security risks. By
eliminating or minimising redundant data in a database, organisations can improve the accuracy and
consistency of their data, reduce storage and maintenance costs, and enhance data security.
SECTION 5 | NORMALISATION
The normalisation process is used to organise data in a database into tables and establish
relationships between them. The process involves several steps, each of which is designed to remove
data redundancies and dependencies. The three most commonly used normal forms are 1st Normal
Form (1NF), 2nd Normal Form (2NF), and 3rd Normal Form (3NF). Here are the differences between
each of these normal forms:
· 1st Normal Form (1NF): In 1NF, each table in a database contains only atomic values, meaning that
each column contains only a single value. This means that data is not stored in a repeating group or
array format, and each table has a primary key that uniquely identifies each row.
· 2nd Normal Form (2NF): In 2NF, the table must be in 1NF and each non-key column must be
functionally dependent on the entire primary key. This means that each non-key column must be
uniquely determined by the primary key, and cannot be determined by a subset of the primary key.
· 3rd Normal Form (3NF): In 3NF, the table must be in 2NF and all non-key columns must be
independent of each other. This means that each non-key column should contain only data that is
related to the primary key, and not contain data that is related to other non-key columns.
To summarise, 1NF requires that each table contain only atomic values, 2NF requires that each non-
key column be functionally dependent on the entire primary key, and 3NF requires that all non-key
columns be independent of each other. These normal forms are used to ensure that the data in a
database is organised efficiently, and is free from data redundancies and dependencies.
A normalised database is a database that is structured in accordance with the principles of database
normalisation. Database normalisation is a process used to organise data in a database into tables
and establish relationships between them, with the aim of reducing data redundancy and ensuring
data integrity. A normalised database has the following characteristics:
· Minimal Data Redundancy: A normalised database minimises data redundancy by organising data
into tables and removing data that is repeated or duplicated
unnecessarily. This helps to reduce the size of the database and improve database performance.
· Consistent Data: A normalised database ensures that data is consistent across tables by removing
data redundancies and dependencies. This helps to improve data integrity and reduce the risk of
data inconsistencies.
· Reduced Update Anomalies: A normalised database reduces the risk of update anomalies by
ensuring that data is stored in the appropriate table and that each table contains only a single,
logically related category of data. This helps to ensure that updates to the data are made only once
and that the data remains consistent across the database.
· Increased Scalability: A normalised database is highly scalable, meaning that it can be easily
expanded or modified to accommodate new data or changing business needs. This is because the
database is organised into tables, which can be modified or added as needed without affecting the
rest of the database.
· Improved Query Performance: A normalised database often has better query performance because
data is organised into smaller, more manageable tables. This allows queries to be processed more
quickly and efficiently, resulting in faster data retrieval times.
· Simplified Maintenance: A normalised database is easier to maintain because data is organised into
tables, making it easier to identify and fix errors or inconsistencies in the data. This helps to reduce
the cost and effort required to maintain the database over time.
The term 'Data Type' refers to the type of data used, for example it could be text, numbers, dates or
time, boolean (Yes/No), Currency or an object such as an image or link. Each data type has its own
data format, for example a data might be written DD/MM/YY or MM/DD/YY. Before setting up a
database you will need to decide on data types and data formats for each field within your database.
Once the data type and format for each field is set when you first create the database it should not
be changed and it will restrict the data that is allowed to be entered, this then helps to ensure data
integrity. Data integrity is the completeness, correctness or accuracy of data. Imagine the problems
that could occur if you did not defining the date data format when you set up the database. Giving
the date 11th Jan 2021 as an example, an English person might enter this data as 11/01/21 an
American person might enter this data as 01/11/21. The list below shows some data types with
example data formats. Text: Two options within the Text data type are short text and long text. Short
text is used for under 256 characters to be entered, as standard most databases as set to short text,
and you would need to specify long text at the set up stage if you want to change this. Note:
numbers such as phone numbers are often set as text data types, this is because they start with a 0
(zero) which can cause problems for number fields. Exam Note: The data type of a phone number
has been a popular examination question. Numbers: Numbers can normally be formatted as integers,
decimal, scientific. Boolean: Boolean fields are used when you want to enforce one of two options
for example YES or NO, ON or OFF, M or F, TRUE or FALSE, 1 or 0. Some database software may only
allow 1 or 0 as options in the boolean datatype selection. Date/Time: Data and time are often
combined, and after selecting this as a data type you select the format. Time options may be 12 or 24
hour formats and then the format such as hh/mm/ss. Date option normally include the structure of
the date such as DD/MM/YYYY. Currency: Currency datatype will allow you to choose the currency
used for example $ or £, it will also allow you to specify the number of decimal places. Object: An
object would normally be something that you cannot enter via the keyboard such as music or a
picture, but you could also have items such as hyperlinks as objects.
SECTION 9 | QUERIES
A query is a request for data from a database. By executing a query, a user can retrieve and
manipulate data stored in the database. A query can also be used to provide a view of a database,
allowing users to see the data in a specific format that is customised to their needs. Queries can
provide a view of a database in some of the following ways:
· Selecting Fields: When creating a query, the user can select the fields they want to view from the
database. This allows them to focus on specific information that is relevant to their needs. For
example, a user might create a query that selects only the customer name, order number, and date
for all orders placed in the last month.
· Filtering Data: Queries can also be used to filter data based on specific criteria. This allows the user
to view only the data that meets their specific needs. For example, a user might create a query that
only shows orders from a specific region or orders that contain a certain product.
· Sorting Data: Queries can also be used to sort data in a specific way. This allows the user to view the
data in a way that makes sense for their needs. For example, a user might create a query that sorts
orders by order date, so that the most recent orders appear at the top.
· Grouping Data: Queries can also be used to group data together based on specific criteria. This
allows the user to view data in a summarised format. For example, a user might create a query that
groups orders by region or by product category.
· Calculating Data: Queries can also be used to calculate data based on specific criteria. This allows
the user to view the data in a way that is useful for their needs. For example, a user might create a
query that calculates the average order size or total sales for a specific period.
By combining these techniques, a query can provide a view of a database that is customised to the
user's needs. This allows the user to access the data they need in a way that is easy to understand
and use.
A simple query is a basic request for data from a database, typically involving only a single table and
a small number of fields. A simple query is usually straightforward and easy to understand, and can
be created using simple query languages or graphical user interfaces. In contrast, a complex query is
a more sophisticated request for data from a database, often involving multiple tables and complex
operations. A complex query can be used to retrieve data that meets specific criteria or to perform
advanced calculations or data manipulations. Some key differences between simple and complex
queries:
· Complexity: Simple queries are less complex than complex queries, typically involving only a single
table and a small number of fields. Complex queries, on the other hand, involve multiple tables,
complex operations, and advanced functions.
· Purpose: Simple queries are often used to retrieve specific data from a database, while complex
queries are used to perform advanced data manipulation and analysis.
· Performance: Simple queries are usually faster and more efficient than complex queries, as they
involve less data and processing. Complex queries, on the other hand, can be slower and more
resource-intensive, especially if they involve large amounts of data or complex calculations.
· Ease of Use: Simple queries are generally easier to create and understand than complex queries.
Simple queries can often be created using simple query languages or graphical user interfaces, while
complex queries may require more advanced technical skills and knowledge.
Simple queries are basic requests for data from a database, involving a single table and a small
number of fields. They are typically straightforward and easy to understand. Complex queries, on the
other hand, involve multiple tables, complex operations, and advanced functions, and are used to
perform advanced data manipulation and analysis. They can be more resource-intensive and require
more advanced technical skills and knowledge.
Constructing a query involves creating a request for data from a database. The methods used to
construct a query can vary depending on the type of database, the query language being used, and
the specific requirements of the user. Although at IB level it is quite likely that a query will be done
using SQL, here are some different methods that can be used:
· Graphical User Interfaces (GUIs): Many database management systems (DBMS) provide graphical
user interfaces (GUIs) that allow users to create queries using a visual interface. Users can select
tables and fields, add filters and sorting criteria, and build complex queries using drag-and-drop
functionality.
· Query Languages: Query languages such as SQL (Structured Query Language) and LINQ (Language-
Integrated Query) can be used to construct queries. These languages provide a syntax for creating
queries that can be executed on a database. SQL is a standard language used for creating and
managing relational databases, while LINQ is a .NET Framework component used to query collections
and databases.
· Stored Procedures: A stored procedure is a set of precompiled SQL statements that can be executed
on a database. Stored procedures can be created to perform specific tasks or to retrieve data that
meets specific criteria. They can be called from applications or other stored procedures to retrieve
data from the database.
· Data Access Layers: Data access layers provide a way to abstract the database from the application
code. They provide a set of methods and functions that can be used to retrieve data from the
database. The data access layer can be used to create queries, and the results can be returned to the
application code for further processing.
· Object-Relational Mapping (ORM): ORM tools provide a way to map database tables to object-
oriented code. The ORM tool can be used to create queries and retrieve data from the database. The
results can be returned as objects that can be used by the application code.
· Web-Based Interfaces: Web-based interfaces can be used to create and execute queries from a web
browser. These interfaces can provide a simple way to access the database from anywhere with an
internet connection.
There are several methods that can be used to construct a query, including graphical user interfaces,
query languages, stored procedures, data access layers, object-relational mapping tools, and web-
based interfaces. The choice of method depends on the specific requirements of the user and the
database management system being used.