ADBMS
ADBMS
Chapter 1:
What is Indexing?
A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file.
The index file usually occupies considerably less disk blocks than the data file because its entries are much smaller
- A dense index has an index entry for every search key value (and hence every record) in the data file.
- A sparse (or non-dense) index, on the other hand, has index entries for only some of the search values
YT Link - https://youtu.be/E--yzX05_k8?feature=shared
Types of Indexing:
YT Link - https://youtu.be/vjrHiaIfOl8?feature=shared
Primary Index
YT Link - https://youtu.be/4E-MGnjMhRw?feature=shared
Clustering Index
YT Link - https://youtu.be/UpJ9ICmzaAM?feature=shared
Secondary Index –
1) A secondary index provides a secondary means of accessing a file for which some primary access already
exists.
2) The secondary index may be on a field which is a candidate key and has a unique value in every record, or a
non-key with duplicate values.
3) The index is an ordered file with two fields.
4) The first field is of the same data type as some non-ordering field of the data file that is an indexing field.
5) The second field is either a block pointer or a record pointer.
6) There can be many secondary indexes (and hence, indexing fields) for the same file.
7) Includes one entry for each record in the data file; hence, it is a dense index
YT Link - https://youtu.be/Ua08uVgsk4k?feature=shared
Multi-Level Indexing –
1) Because a single-level index is an ordered file, we can create a primary index to the index itself
2) In this case, the original index file is called the first-level index and the index to the index is called the
second-level index.
3) We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk
block
4) A multi-level index can be created for any type of first level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk block
YT Link - https://youtu.be/KcApkM5WYGw?feature=shared
YT Link - https://youtu.be/YUtUNlLNB5c?feature=shared
Types of Databases –
Multimedia database is the collection of interrelated multimedia data that includes text, graphics
(sketches, drawings), images, animations, video, audio etc and have vast amounts of multisource
multimedia data. The framework that manages different types of multimedia data which can be stored,
delivered and utilized in different ways is known as multimedia database management system. There are
three classes of the multimedia database which includes static media, dynamic media and dimensional
media.
Content of Multimedia Database management system:
1) Media data – The actual data representing an object.
2) Media format data – Information such as sampling rate, resolution, encoding scheme etc. about
the format of the media data after it goes through the acquisition, processing and encoding
phase.
3) Media keyword data – Keywords description relating to the generation of data. It is also known
as content descriptive data. Example: date, time and place of recording.
4) Media feature data – Content dependent data such as the distribution of colours, kinds of
texture and different shapes present in data.
There are still many challenges to multimedia databases, some of which are:
1) Modelling
2) Design
3) Storage
4) Performance
5) Queries and Retrieval
2) Knowledge dissemination
1. Data Types: Mobility databases may include data on various transportation modes,
such as road networks, public transit systems, walking and biking routes, traffic flow,
and more. They can also encompass data on travel times, congestion levels, vehicle
counts, and even data from GPS and mobile devices.
2. Applications:
- Urban Planning: Helps cities plan infrastructure based on real mobility data.
- Traffic Management: Enables real-time traffic monitoring and congestion mitigation.
- Transportation Research: Supports studies on travel behaviour and mode choices.
-
3. Drawbacks:
- NOSQL
Advantages of NoSQL: There are many advantages of working with NoSQL databases
such as MongoDB and Cassandra. The main advantages are high scalability and high
availability.
1. High availability: Auto replication feature in NoSQL databases makes it highly
available because in case of any failure data replicates itself to the previous
consistent state.
2. Scalability: NoSQL databases are highly scalable, which means that they can
handle large amounts of data and traffic with ease.
3. Performance: NoSQL databases are designed to handle large amounts of data
and traffic, which means that they can offer improved performance compared
to traditional relational databases.
4. Cost-effectiveness: NoSQL databases are often more cost-effective than
traditional relational databases, as they are typically less complex and do not
require expensive hardware or software.
5. Agility: Ideal for agile development.
Disadvantages of NoSQL: NoSQL has the following disadvantages.
1. Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which
means that they do not guarantee the consistency, integrity, and durability of data.
2. GUI is not available: GUI mode tools to access the database are not flexibly
available in the market.
3. Backup: Backup is a great weak point for some NoSQL databases like MongoDB.
MongoDB has no approach for the backup of data in a consistent manner.
- XML Database
1. Native XML Storage: XML databases store XML documents in their native format,
preserving the hierarchical structure and metadata associated with the data. This allows
for efficient querying and retrieval of XML content.
2. Querying and Indexing: XML databases provide query languages (such as XQuery or
XPath) and indexing mechanisms tailored for XML data. Users can search, filter, and
extract specific elements or attributes from XML documents.
3. Web Services Integration: XML databases are commonly used in conjunction with
web services and web applications, as XML is a fundamental data format for
representing data exchanged over the internet.
4. Semi-Structured Data: XML databases can handle both structured and semi-
structured data, making them suitable for applications with flexible data schemas.
A graph database (GDB) is a database that uses graph structures for storing data. It
uses nodes, edges, and properties instead of tables or documents to represent and store
data. The edges represent relationships between the nodes. This helps in retrieving data
more easily and, in many cases, with one operation. Graph databases are commonly
referred to as a NoSQL.
If we have friends of friends and stuff like that, these are many to many relationships.
Used when the query in the relational database is very complex.
For example- there is a profile and the profile has some specific information in it but the
major selling point is the relationship between these different profiles that is how you
get connected within a network.
In the same way, if there is data element such as user data element inside a graph
database there could be multiple user data elements but the relationship is what is
going to be the factor for all these data elements which are stored inside the graph
database.
Advantages: Frequent schema changes, managing volume of data, real-time query
response time, and more intelligent data activation requirements are done by graph
model.
Disadvantages: Note that graph databases aren’t always the best solution for an
application. We will need to assess the needs of application before deciding the
architecture.
Limitations of Graph Databases:
• Graph Databases may not be offering better choice over the NoSQL variations.
• If application needs to scale horizontally this may introduces poor performance.
• Not very efficient when it needs to update all nodes with a given parameter.
- Federated Database:
Data integration: A federated database management system integrates data from various
databases and platforms, enabling organizations to analyze and gain insights from data
distributed across multiple systems.
Complexity: Federated databases are more complex than traditional centralized ones,
requiring multiple data sources, schemas, and distributed transactions, making system
design, implementation, and maintenance challenging.
Performance: Federated databases may face performance issues due to the overhead of
managing distributed transactions and retrieving data from multiple sources, resulting in
slower response times and increased network traffic.
Security: Federated databases may be more vulnerable to security breaches since they
are spread across multiple locations and may be accessed by different users and
applications. Ensuring data privacy, integrity, and security across all the distributed
databases can be a significant challenge.
Cost: Federated databases can be costly to implement and maintain due to specialized
hardware, software, network infrastructure, licensing, and support costs for individual
databases