Unit01-Advanced Data Management Techniques
Unit01-Advanced Data Management Techniques
Names of Sub-Units
Introduction to Data Management, Types of Data, Overview of Data Models, Introduction to OLTP,
Dimensional Modeling Life Cycle, New Trends in Computing, Motivation and Need for Advanced Data
Management Techniques.
Overview
The unit begins by explaining types of data. Further, the unit provides the overview of data models.
The unit also familiarises you with the importance of OLTP and its limitations. Towards the end, you
will be acquainted with dimensional modelling life cycle and motivation and need for advanced data
management techniques.
Learning Objectives
Learning Outcomes
1.1 INTRODUCTION
Data management is the process of absorbing, storing, organising and preserving an organisation’s
data. Effective data management is a critical component of establishing IT systems that operate business
applications and offer analytical information to enable corporate executives, business managers and
other end-users to drive operational decision-making and strategic planning.
The data management process consists of a number of tasks that work together to ensure that the data
in business systems is correct, available and accessible. The majority of the needed work is done by IT
and data management teams, but business users often engage in various portions of the process to
ensure that the data fulfils their needs.
Data management has become increasingly important as firms face a growing number of regulatory
compliance obligations. Furthermore, businesses are gathering ever-increasing amounts of data and
a broader range of data types, both of which are trademarks of the big data platforms that many have
adopted. Without proper data management, such settings may become cumbersome and difficult to
traverse.
Since data is so crucial in our lives, it is critical that it is correctly stored and processed without any
errors. When dealing with datasets, the type of data plays an essential role in determining which pre-
processing approach would work best for a certain set to obtain the best results. In other words, data
type is basically a trait associated with a piece of data that instructs a computer system on how to
interpret its value. By understanding data types, one can ensure that data is collected in the desired
format and the value of each property is as expected.
2
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Types of Data
Nominal Discrete
Ordinal Continuous
3
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques
Another way of categorising data is the way it is arranged. Under this category, data is classified as
structured, semi-structured and unstructured. Let us understand them in detail.
1.2.1 Structured
Structured data is arranged into a prepared repository, usually a database, so that its pieces may be
addressed for more efficient processing and analysis. In a database, for example, each field is distinct,
and its data may be accessed alone or in conjunction with data from other fields, in a number of
combinations. The database’s strength is its capacity to make data comprehensive enough to offer
relevant information. SQL (standard query language) is a database query language that allows a
database administrator to interface with the database.
Structured data is distinguished from unstructured and semi-structured data. The three types of data
may be thought of as being on a scale with unstructured data being the least formatted and structured
data being the most formatted. As data becomes more organised, it becomes more receptive to
processing.
1.2.2 Semi-structured
Semi-structured data is arranged into a specific repository, such as a database, but includes associated
information, such as metadata, that makes it easier to process than raw data. In other words, semi-
structured data falls somewhere in between, which means it is neither fully structured nor unstructured.
It is not structured in a way that allows for advanced access and analysis; nonetheless, it may include
information connected with it, such as metadata tagging that allows components contained to be
addressed. Unstructured data is commonly assumed to be a Word document. However, metadata tags
can be added in the form of keywords that reflect the document’s content; as a result, the document can
be easily found if those keywords are searched for. The data is now semi-structured. Nonetheless, the
document lacks the database’s deep arrangement and hence falls short of being completely organised
data.
4
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
formats, including Word documents, email messages, PowerPoint presentations, survey replies, contact
centre transcripts, and blog and social media postings. Images, audio, and video files are examples of
unstructured data. Machine data is another type of unstructured data that is rapidly developing in many
organisations. For example, log files from websites, servers, networks, and apps, particularly mobile
applications, provide related to activity and performance. Furthermore, businesses are increasingly
capturing and analysing data from sensors on industrial equipment and other IoT-connected devices.
Data models include vital information for organisations because they define the relationships between
database tables, foreign keys, and the events involved. The three fundamental data model styles are as
follows:
Conceptual data model: A conceptual data model is a logical representation of database ideas
and their connections. The goal of developing a conceptual data model is to define entities, their
properties, and their connections. There is very little information about the actual database structure
at this level of data modelling. A conceptual data model is often created by business stakeholders
and data architects.
Logical data model: The logical data model is used to specify the structure of data pieces as well as
their connections. The elements of the conceptual data model are supplemented by the logical data
model. The benefit of adopting a logical data model is that it serves as a basis for the Physical model.
The modelling framework, on the other hand, remains general.
There are no main or secondary keys declared at this data modeling level. You must validate and
change the connection details that were defined earlier for relationships at this data modelling level.
Physical data model: A physical data model specifies how the data model is implemented in a
database. It provides database abstraction and aids in the creation of the schema. This is due to
the abundance of meta-data that a physical data model provides. By duplicating database column
keys, constraints, indexes, triggers, and other RDBMS characteristics, the physical data model aids
in visualising database structure.
5
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques
OLAP software is used to do multidimensional analysis on huge amounts of data from a data warehouse,
data mart, or other unified, centralised data repository at fast speeds. Most corporate data have
several dimensions that are, multiple categories into which the data is divided for display, monitoring
or analysis. Sales data, for example, may include various aspects linked to place (region, nation, state/
province, shop), time (year, month, week, day), product (clothing, men/women/children, brand, kind),
and other factors. However, data sets in a data warehouse are kept in tables, each of which may organise
data into just two of these dimensions at a time. OLAP takes data from various relational data sets and
reorganises it into a multidimensional structure for quick processing and in-depth analysis.
6
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
7
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques
Graph databases: Patterns and correlations may be defined in graph databases using mathematical
structures that reflect pairwise relationships between items. Graph databases outperform typical
relational databases when dealing with huge datasets. Graph databases are made up of nodes or
data items that are linked together. These connections or commonalities endure between nodes.
This can be useful for merchants in identifying consumer behaviour and purchasing trends. Graph
databases behave differently than relational databases.
Data protection: Data security is usually a top trend in data management, as organisations
emphasise data integrity and reduce risks of data breaches and loss. Secure data solutions, whether
on-premises or in hybrid multi-cloud settings, enable you to get improved visibility and insights to
analyse and address risks. They also implement real-time controls and conform to a plethora of
compliance mandates such as GDPR, PCI, HIPAA, and SOX. Security infrastructure based on SIEM,
SOAR, and SASE produces highly automated detection and response systems that provide security
professionals with the most recent breakthroughs in corporate defence.
artificial Intelligence (aI): Artificial intelligence is sometimes referred to as limited or weak AI since
it is meant to accomplish a small or particular purpose. Google search, picture recognition software,
personal assistants like Siri and Alexa, and self-driving cars are all instances of limited AI. AI reliably
and without tiredness executes frequent, high-volume computerised tasks. AI searches for structure
and regularities in data to learn a skill, transforming the algorithm into a classifier or predictor. As
a result, just as the algorithm can teach itself how to play chess, it can also educate itself on what
product to promote next on the Internet. When presented with fresh data, the models evolve over
time.
Persistent Memory (PMEM): Businesses expect higher performance when apps based on machine
learning or artificial intelligence (AI) operate in real-time. Keeping more data near to the processor
and in a permanent state gives crucial benefits such as higher throughput and lower latency.
PMEM, which was created in collaboration between Intel and Micron, allows Intel-based servers to
extend their memory footprint up to 4 TB in capacity. This delivers a significant speed improvement
for in-memory databases or locally saved information. PMEM storage medium outperforms NVMe
devices in terms of speed.
Natural Language Processing: The combination between human language and technology is
known as natural language processing (NLP). Voice-activated technology such as Alexa, Siri, and
Google Home are well-known instances of natural language processing. NLP acts as a front-end
to an Artificial Intelligence backend, converting speech queries into actionable results. As NLP
advances, so does the gap in voice search data. Businesses must get on board with voice searches,
which are estimated to account for 50% of all searches by 2020. NLP assists in the processing and
collection of voice-based data, and it may also function internally for individuals who require data
access via voice requests.
Enhanced analytics: Augmented analytics augments how individuals explore and analyse data in
analytics and BI systems by utilising enabling technologies like machine learning and AI to aid with
data preparation, insight production and insight explanation. Because the analysis is automated
and can be configured to run continually, the heavy lifting of manually sorting through enormous
volumes of complicated data (due to a lack of skills or time restrictions) is considerably minimised.
Augmented data preparation combines data from numerous sources considerably more quickly.
Embedded analytics: Those who understand the realm of analytics are also aware of how time-
consuming and ineffective data translation can be. Embedded analytics is a digital workplace
feature that allows data analysis to take place within a user’s natural workflow rather than
switching to another application. Furthermore, embedded analytics is typically used to optimise
certain marketing campaigns, sales lead conversions, inventory demand planning and financial
8
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
budgeting. Embedded analytics may help an organisation to decrease the amount of time analytics
professionals spend transforming data into readable insights for everyone. That means they can
do what they do best: evaluate data to generate answers and plans. It even enables employees to
engage with data and create simple visuals for better decision-making.
These capabilities are considered advanced in relation to DBMSs. Many complex database management
systems are available as cloud services, although others remain available as software solutions, the
most prominent of which being MongoDB. Navicat for MongoDB is a wonderful tool to use if you want to
understand more about it. It supports database objects including Collections, Views, Functions, Indexes,
GridFS, and MapReduce. An object designer is also available for creating, modifying and designing
database objects.
Advanced data management has long been at the heart of effective database and information systems.
It covers a wide range of data models and the fundamentals of organising, processing, storing and
querying data using these models.
Data management is the process of absorbing, storing, organising and preserving an organisation’s
data.
The data management process consists of a number of tasks that work together to ensure that the
data in business systems is correct, available and accessible.
When dealing with datasets, the type of data plays an essential role in determining which
preprocessing approach would work best for a certain set to obtain the best results
9
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques
Structured data is arranged into a prepared repository, usually a database, so that its pieces may be
addressed for more efficient processing and analysis
Semi-structured data is arranged into a specific repository, such as a database, but includes
associated information, such as metadata, that makes it easier to process than raw data.
Unstructured data is arranged into a format that allows it to be accessed and processed more easily
A data model is a representation of the logical interrelationships and data flow between various
data items in the information domain.
Data models include vital information for organisations because they define the relationships
between database tables, foreign keys, and the events involved.
Online Analytical Processing (OLAP) refers to a type of software that allows users to examine data
from various database systems at the same time.
Dimensional Modelling (DM) is a data structure approach designed specifically for data storage in
a data warehouse.
1.9 GLOSSARY
Database: A set of data that is organised to allow users to find and retrieve it quickly and easily
DBMS: A computerised database management system
OLaP: A powerful analysis tool used for forecasting, statistical computations and aggregations
Data management: It is the process of absorbing, storing, organising and preserving an
organisation’s data.
Structured data: It is arranged into a prepared repository, usually a database, so that its pieces may
be addressed for more efficient processing and analysis
Data model: A representation of the logical interrelationships and data flow between various data
items in the information domain.
Dimensional Modelling (DM): It is a data structure approach designed specifically for data storage
in a data warehouse.
10
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
https://pdfs.semanticscholar.org/d127/c1bf1fb31fe33f054f3bd2dc4f44c0615987.pdf
Using various sources, find the applications of OLAP in the real-world and discuss it with your
classmates.
11