0% found this document useful (0 votes)
7 views

Unit01-Advanced Data Management Techniques

Advance Data Management

Uploaded by

Altaf Mukadam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit01-Advanced Data Management Techniques

Advance Data Management

Uploaded by

Altaf Mukadam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

UNIT

01 Introduction to Data Management


Techniques

Names of Sub-Units

Introduction to Data Management, Types of Data, Overview of Data Models, Introduction to OLTP,
Dimensional Modeling Life Cycle, New Trends in Computing, Motivation and Need for Advanced Data
Management Techniques.

Overview

The unit begins by explaining types of data. Further, the unit provides the overview of data models.
The unit also familiarises you with the importance of OLTP and its limitations. Towards the end, you
will be acquainted with dimensional modelling life cycle and motivation and need for advanced data
management techniques.

Learning Objectives

In this unit, you will learn to:


 Explain the types of data
 Discuss the concept of data models
 Describe the significance of OLTP in data management
 State the importance of dimensional modelling life cycle
 Explain the motivation and need for advanced data management
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques

Learning Outcomes

At the end of this Unit, you would:


 Examine the types of data
 Evaluate the different types of data models
 Examine how data models principles are used.
 Assess the need for advanced data management

1.1 INTRODUCTION
Data management is the process of absorbing, storing, organising and preserving an organisation’s
data. Effective data management is a critical component of establishing IT systems that operate business
applications and offer analytical information to enable corporate executives, business managers and
other end-users to drive operational decision-making and strategic planning.

The data management process consists of a number of tasks that work together to ensure that the data
in business systems is correct, available and accessible. The majority of the needed work is done by IT
and data management teams, but business users often engage in various portions of the process to
ensure that the data fulfils their needs.

Data management has become increasingly important as firms face a growing number of regulatory
compliance obligations. Furthermore, businesses are gathering ever-increasing amounts of data and
a broader range of data types, both of which are trademarks of the big data platforms that many have
adopted. Without proper data management, such settings may become cumbersome and difficult to
traverse.

1.2 TYPES OF DATA


Everyone be them individuals or organisations are surrounded by data. Organisations have data related
to customers, shareholders, suppliers, employees, products and so on. On the other hand, individuals
have a lot of personal data in their computer systems. Data provides organisations with actionable
insights that can assist in designing existing campaigns, managing new goods, or performing new
experiments. We are approaching a digital era in which there a large amount of data will be generated
on a daily basis. For example, a firm like Flipkart generates more than 2TB of data every day.

Since data is so crucial in our lives, it is critical that it is correctly stored and processed without any
errors. When dealing with datasets, the type of data plays an essential role in determining which pre-
processing approach would work best for a certain set to obtain the best results. In other words, data
type is basically a trait associated with a piece of data that instructs a computer system on how to
interpret its value. By understanding data types, one can ensure that data is collected in the desired
format and the value of each property is as expected.

2
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

There are broadly two types of data, as shown in Figure 1:

Types of Data

Qualitative Data Quantitative Data

Nominal Discrete

Ordinal Continuous

Figure 1: Data Types


Let us discuss these data types as follows:
 Qualitative data: Qualitative or categorical data describes the object under consideration using a
finite set of discrete classes. It means that this type of data cannot be counted or measured easily
using numbers and therefore divided into categories. The gender of a person (male, female, or
others) is a good example of this data type.
These are usually extracted from audio, images, or text medium. Another example can be of a
smartphone brand that provides information about the current rating, the colour of the phone,
category of the phone, and so on. There are two subcategories under qualitative data:
 Nominal: These are the values that do not have a natural ordering. For example, it is impossible
to say that ‘Red’ is greater than ‘Blue.’ Another aspect in which we cannot distinguish between
male, female, or others is a person’s gender. The nominal data type for mobile phone categories,
whether midrange, budget segment, or luxury smartphone, is likewise the nominal data type.
 Ordinal: These values have a natural ordering while remaining within their class of values.
When it comes to clothing brands, we can easily classify them according to their size tag in the
order of small, medium and big. The grading method used to mark candidates in an exam may
also be thought of as an ordinal data type, with A+ clearly superior to a B grade.
These categories assist us in determining which encoding approach is appropriate for which type of
data. Because machine learning models are mathematical in nature, data encoding for qualitative
data is necessary because these values cannot be handled directly by the models and must be
translated to numerical kinds. For nominal data types where there is no comparison among the
categories, one-hot encoding may be used, which is comparable to binary coding because there are
fewer numbers, and for ordinal data types, label encoding, which is a sort of integer encoding, can
be used.
 Quantitative data: This data type attempts to quantify things by taking into account numerical
values that make it countable in nature. The price of a smartphone, the discount offered, the number
of ratings on a product, the frequency of a smartphone’s processor, or the RAM of that particular
phone are all examples of quantitative data types.
The important thing to remember is that a feature can have an endless number of values. For
example, the price of a smartphone can range from ‘x’ amount to any value, and it can be further
subdivided into fractional amounts.

3
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques

The two classifications that best define them are:


 Discrete: This category includes numerical values that are either integers or whole numbers.
The number of speakers in a phone, cameras, processing cores, and the number of SIM cards
supported are all instances of discrete data.
 Continuous: Fractional numbers are treated as continuous values. These might include the
processors’ working frequency, the phone’s Android version, Wi-Fi frequency, temperature, and
so on.

Another way of categorising data is the way it is arranged. Under this category, data is classified as
structured, semi-structured and unstructured. Let us understand them in detail.

1.2.1 Structured
Structured data is arranged into a prepared repository, usually a database, so that its pieces may be
addressed for more efficient processing and analysis. In a database, for example, each field is distinct,
and its data may be accessed alone or in conjunction with data from other fields, in a number of
combinations. The database’s strength is its capacity to make data comprehensive enough to offer
relevant information. SQL (standard query language) is a database query language that allows a
database administrator to interface with the database.
Structured data is distinguished from unstructured and semi-structured data. The three types of data
may be thought of as being on a scale with unstructured data being the least formatted and structured
data being the most formatted. As data becomes more organised, it becomes more receptive to
processing.

1.2.2 Semi-structured
Semi-structured data is arranged into a specific repository, such as a database, but includes associated
information, such as metadata, that makes it easier to process than raw data. In other words, semi-
structured data falls somewhere in between, which means it is neither fully structured nor unstructured.
It is not structured in a way that allows for advanced access and analysis; nonetheless, it may include
information connected with it, such as metadata tagging that allows components contained to be
addressed. Unstructured data is commonly assumed to be a Word document. However, metadata tags
can be added in the form of keywords that reflect the document’s content; as a result, the document can
be easily found if those keywords are searched for. The data is now semi-structured. Nonetheless, the
document lacks the database’s deep arrangement and hence falls short of being completely organised
data.

1.2.3 Unstructured Data


Unstructured data is arranged into a format that allows it to be accessed and processed more easily. In
actuality, only a small percentage of data is entirely unstructured. Even papers and photographs, which
are commonly regarded as unstructured data, are organised to some level. Unstructured data does not
adhere to traditional data models, making it challenging to store and manage in a traditional relational
database.
Although unstructured data has an underlying structure, it lacks a specified data model or schema. It
might be either textual or non-textual. It can be created by either humans or machines. Text is one of
the most popular forms of unstructured data. Unstructured text is created and collected in a variety of

4
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

formats, including Word documents, email messages, PowerPoint presentations, survey replies, contact
centre transcripts, and blog and social media postings. Images, audio, and video files are examples of
unstructured data. Machine data is another type of unstructured data that is rapidly developing in many
organisations. For example, log files from websites, servers, networks, and apps, particularly mobile
applications, provide related to activity and performance. Furthermore, businesses are increasingly
capturing and analysing data from sensors on industrial equipment and other IoT-connected devices.

1.3 OVERVIEW OF DATA MODELS


A data model is a representation of the logical interrelationships and data flow between various
data items in the information domain. It also describes how data is stored and accessed. Data models
assist in describing what data is needed and in what format it should be utilised for various business
operations. A data model might be either tangible or abstract in nature. It is made up of the following
major components:
 Data types
 Data items
 Data sources
 Event sources
 Links

Data models include vital information for organisations because they define the relationships between
database tables, foreign keys, and the events involved. The three fundamental data model styles are as
follows:
 Conceptual data model: A conceptual data model is a logical representation of database ideas
and their connections. The goal of developing a conceptual data model is to define entities, their
properties, and their connections. There is very little information about the actual database structure
at this level of data modelling. A conceptual data model is often created by business stakeholders
and data architects.
 Logical data model: The logical data model is used to specify the structure of data pieces as well as
their connections. The elements of the conceptual data model are supplemented by the logical data
model. The benefit of adopting a logical data model is that it serves as a basis for the Physical model.
The modelling framework, on the other hand, remains general.
There are no main or secondary keys declared at this data modeling level. You must validate and
change the connection details that were defined earlier for relationships at this data modelling level.
 Physical data model: A physical data model specifies how the data model is implemented in a
database. It provides database abstraction and aids in the creation of the schema. This is due to
the abundance of meta-data that a physical data model provides. By duplicating database column
keys, constraints, indexes, triggers, and other RDBMS characteristics, the physical data model aids
in visualising database structure.

1.4 INTRODUCTION TO OLTP


Online Analytical Processing (OLAP) refers to a type of software that allows users to examine data from
various database systems at the same time. It is a tool that allows analysts to extract and examine
data from different perspectives. Analysts must regularly organise, aggregate and combine data. These
OLAP processes in data mining need a lot of resources. Data may be pre-calculated and pre-aggregated
using OLAP, making analysis quicker.

5
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques

OLAP software is used to do multidimensional analysis on huge amounts of data from a data warehouse,
data mart, or other unified, centralised data repository at fast speeds. Most corporate data have
several dimensions that are, multiple categories into which the data is divided for display, monitoring
or analysis. Sales data, for example, may include various aspects linked to place (region, nation, state/
province, shop), time (year, month, week, day), product (clothing, men/women/children, brand, kind),
and other factors. However, data sets in a data warehouse are kept in tables, each of which may organise
data into just two of these dimensions at a time. OLAP takes data from various relational data sets and
reorganises it into a multidimensional structure for quick processing and in-depth analysis.

1.4.1 Need of OLTP


An Online Transaction Processing (OLTP) system logs and stores data related to transactions in a
database. Individual database entries consist of numerous fields or columns used in each transaction.
Banking and credit card activity are the two examples. As OLTP databases are read, written, and
updated often, the emphasis in OLTP is on quick processing. If a transaction fails, built-in system logic
guarantees that the data is not corrupted.
For data mining, analytics and business intelligence initiatives, OLAP applies complicated queries to
massive volumes of historical data gathered from OLTP databases and other sources. The emphasis of
OLAP is on response time to these sophisticated queries. Each query comprises one or more columns of
data collected from a large number of rows. Year-over-year financial performance or marketing lead
generation patterns are the two examples. OLAP databases and data warehouses enable analysts and
decision-makers to transform data into information by allowing them to employ bespoke reporting
tools. The failure of a query in OLAP does not stop or delay transaction processing for customers, but it
might delay or damage the accuracy of business intelligence insights.

1.4.2 advantage of OLTP


The main advantage of OLTP is that it can handle many transaction requests simultaneously and to
reliably backup and continue in case part of the system fails. The following are some other advantages
of the OLTP system:
 Provides an accurate revenue and cost prediction
 Provides timely update of all transactions
 Simplifies transactions on behalf of customers
 Supports larger databases
 Performs data partitioning for data manipulation
 Utilises consistency and concurrency to complete activities that assure its better availability

1.4.3 Limitations of OLTP


The following are the limitations of the OLTP system:
 If the OLTP system has hardware issues, online transactions are adversely impacted.
 OLTP systems enable several users to view and modify the same data at the same time, which
frequently results in an unusual situation.
 If the server lags for a few seconds, a significant number of transactions may be impacted.
 In order to manage inventories, OLTP necessitate a large number of employees working in groups.

6
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

 OLTP lacks suitable means for sending merchandise to customers.


 OLTP increases the database’s vulnerability to hackers and attackers.
 In B2B transactions, there is a risk that both buyers and suppliers will miss out on the system’s
efficiency benefits.
 A server failure may result in the loss of substantial volumes of data from the database.

1.5 DIMENSIONAL MODELLING LIFE CYCLE


Dimensional Modelling (DM) is a data structure approach designed specifically for data storage in a
data warehouse. The goal of dimensional modelling is to improve the database for quicker data retrieval.
Ralph Kimball created the Dimensional Modelling idea, which comprises “fact” and “dimension” tables.
A data warehouse dimensional model is intended to read, summarise, and analyse numeric information
such as values, balances, counts, weights, and so on. Relation models, on the other hand, are ideal for
adding, updating, and deleting data in a real-time Online Transaction System.
These dimensional and relational models each have their own approach to storing data, which offers
distinct advantages. In the relational model, for example, normalisation and ER models eliminate data
redundancy. A dimensional model in a data warehouse, on the other hand, organises data in such a way
that it is easy to access information and produce reports.

1.6 NEW TRENDS IN COMPUTING


Businesses that acquire troves of fresh data (and likely save a large quantity of old data) have challenges
in organising and utilising it to their advantage. Much of this is due to constantly evolving and growing
technologies, which may quickly become burdensome for firms attempting to be more effective and
discrete in their data use. Keeping abreast with current trends and knowing the best solutions is quite
challenging for organisations. The following are current data management trends:
 The Cloud: Cloud computing continues to offer several benefits to organisations. The use of a multi-
cloud topology is gaining traction. As the use of Kubernetes, an open-source that automates the
deployment and management of cloud native applications, grows, apps may become incredibly
mobile. Due to this mobility, apps are going to the public cloud and being pulled back into a private
cloud in a two-way street.
Security is becoming more stringent as businesses such as VMware expand their capabilities for
securing an organisation’s infrastructure. Machine learning is being infused into practically every
area of IT architecture. Improved insights are proven to be a game-changer for businesses trying to
understand their infrastructures.
 automated data handling: Vendors are integrating machine learning and AI engines to make
self-configuring and self-tuning processes more common. This significantly reduces manual data
handling in organisations. Many manual procedures are automated by these processes, allowing
people with fewer technical abilities to be more independent while accessing data.
 Infrastructure for machine learning: Machine Learning (ML), like AI, has reputation for being too
complex for most enterprises, but the payback is practically priceless. Machine learning-powered
automated solutions may be quite useful in data management since they extract information and
combine it into easily digestible visual representations. Productivity, ML solutions are being included
in the majority of cloud-based SaaS products or supplied as a stand-alone cloud-based architecture
that is simple to set up and use.

7
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques

 Graph databases: Patterns and correlations may be defined in graph databases using mathematical
structures that reflect pairwise relationships between items. Graph databases outperform typical
relational databases when dealing with huge datasets. Graph databases are made up of nodes or
data items that are linked together. These connections or commonalities endure between nodes.
This can be useful for merchants in identifying consumer behaviour and purchasing trends. Graph
databases behave differently than relational databases.
 Data protection: Data security is usually a top trend in data management, as organisations
emphasise data integrity and reduce risks of data breaches and loss. Secure data solutions, whether
on-premises or in hybrid multi-cloud settings, enable you to get improved visibility and insights to
analyse and address risks. They also implement real-time controls and conform to a plethora of
compliance mandates such as GDPR, PCI, HIPAA, and SOX. Security infrastructure based on SIEM,
SOAR, and SASE produces highly automated detection and response systems that provide security
professionals with the most recent breakthroughs in corporate defence.
 artificial Intelligence (aI): Artificial intelligence is sometimes referred to as limited or weak AI since
it is meant to accomplish a small or particular purpose. Google search, picture recognition software,
personal assistants like Siri and Alexa, and self-driving cars are all instances of limited AI. AI reliably
and without tiredness executes frequent, high-volume computerised tasks. AI searches for structure
and regularities in data to learn a skill, transforming the algorithm into a classifier or predictor. As
a result, just as the algorithm can teach itself how to play chess, it can also educate itself on what
product to promote next on the Internet. When presented with fresh data, the models evolve over
time.
 Persistent Memory (PMEM): Businesses expect higher performance when apps based on machine
learning or artificial intelligence (AI) operate in real-time. Keeping more data near to the processor
and in a permanent state gives crucial benefits such as higher throughput and lower latency.
PMEM, which was created in collaboration between Intel and Micron, allows Intel-based servers to
extend their memory footprint up to 4 TB in capacity. This delivers a significant speed improvement
for in-memory databases or locally saved information. PMEM storage medium outperforms NVMe
devices in terms of speed.
 Natural Language Processing: The combination between human language and technology is
known as natural language processing (NLP). Voice-activated technology such as Alexa, Siri, and
Google Home are well-known instances of natural language processing. NLP acts as a front-end
to an Artificial Intelligence backend, converting speech queries into actionable results. As NLP
advances, so does the gap in voice search data. Businesses must get on board with voice searches,
which are estimated to account for 50% of all searches by 2020. NLP assists in the processing and
collection of voice-based data, and it may also function internally for individuals who require data
access via voice requests.
 Enhanced analytics: Augmented analytics augments how individuals explore and analyse data in
analytics and BI systems by utilising enabling technologies like machine learning and AI to aid with
data preparation, insight production and insight explanation. Because the analysis is automated
and can be configured to run continually, the heavy lifting of manually sorting through enormous
volumes of complicated data (due to a lack of skills or time restrictions) is considerably minimised.
Augmented data preparation combines data from numerous sources considerably more quickly.
 Embedded analytics: Those who understand the realm of analytics are also aware of how time-
consuming and ineffective data translation can be. Embedded analytics is a digital workplace
feature that allows data analysis to take place within a user’s natural workflow rather than
switching to another application. Furthermore, embedded analytics is typically used to optimise
certain marketing campaigns, sales lead conversions, inventory demand planning and financial

8
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

budgeting. Embedded analytics may help an organisation to decrease the amount of time analytics
professionals spend transforming data into readable insights for everyone. That means they can
do what they do best: evaluate data to generate answers and plans. It even enables employees to
engage with data and create simple visuals for better decision-making.

1.7 MOTIVATION AND NEED FOR ADVANCED DATA MANAGEMENT TECHNIQUES


Advanced data management has long been at the heart of effective database and information systems.
Recent trends such as big data and cloud computing have increased the demand for sophisticated and
adaptable data storage and processing solutions.
The idea of advanced data management has emerged in recent decades, with a concentration on
data structures and query languages. It covers a wide range of data models and the fundamentals of
organising, processing, storing, and querying data using these models.
NoSQL/new SQL databases are a new breed of databases that should be targeted. Advanced database
management systems also enable emerging data management trends driven by application
requirements, such as advanced analytics, stream processing systems, and main memory data
processing.
The majority of cutting-edge database research is aimed at:
 Unstructured and semi-structured data
 Online analytical processing (OLAP)
 Data and computational resources that are distributed
 Decentralised control over the spread of resources
 Real-time dynamic response to external event
 Incorporation into the physical reality in which the organisation exists

These capabilities are considered advanced in relation to DBMSs. Many complex database management
systems are available as cloud services, although others remain available as software solutions, the
most prominent of which being MongoDB. Navicat for MongoDB is a wonderful tool to use if you want to
understand more about it. It supports database objects including Collections, Views, Functions, Indexes,
GridFS, and MapReduce. An object designer is also available for creating, modifying and designing
database objects.
Advanced data management has long been at the heart of effective database and information systems.
It covers a wide range of data models and the fundamentals of organising, processing, storing and
querying data using these models.

Conclusion 1.8 CONCLUSION

 Data management is the process of absorbing, storing, organising and preserving an organisation’s
data.
 The data management process consists of a number of tasks that work together to ensure that the
data in business systems is correct, available and accessible.
 When dealing with datasets, the type of data plays an essential role in determining which
preprocessing approach would work best for a certain set to obtain the best results

9
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Advanced Data Management Techniques

 Structured data is arranged into a prepared repository, usually a database, so that its pieces may be
addressed for more efficient processing and analysis
 Semi-structured data is arranged into a specific repository, such as a database, but includes
associated information, such as metadata, that makes it easier to process than raw data.
 Unstructured data is arranged into a format that allows it to be accessed and processed more easily
 A data model is a representation of the logical interrelationships and data flow between various
data items in the information domain.
 Data models include vital information for organisations because they define the relationships
between database tables, foreign keys, and the events involved.
 Online Analytical Processing (OLAP) refers to a type of software that allows users to examine data
from various database systems at the same time.
 Dimensional Modelling (DM) is a data structure approach designed specifically for data storage in
a data warehouse.

1.9 GLOSSARY

 Database: A set of data that is organised to allow users to find and retrieve it quickly and easily
 DBMS: A computerised database management system
 OLaP: A powerful analysis tool used for forecasting, statistical computations and aggregations
 Data management: It is the process of absorbing, storing, organising and preserving an
organisation’s data.
 Structured data: It is arranged into a prepared repository, usually a database, so that its pieces may
be addressed for more efficient processing and analysis
 Data model: A representation of the logical interrelationships and data flow between various data
items in the information domain.
 Dimensional Modelling (DM): It is a data structure approach designed specifically for data storage
in a data warehouse.

1.10 SELF-ASSESSMENT QUESTIONS

A. Essay Type Questions


1. Explain quantitative data in detail.
2. Distinguish between structured, semi-structured and unstructured data.
3. Discuss the importance of OLAP.
4. What do you mean by artificial intelligence?
5. Discuss embedded analytics.

10
UNIT 01: Introduction to Data Management Techniques JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

1.11 ANSWERS AND HINTS FOR SELF-ASSESSMENT QUESTIONS

B. Hints for Essay Type Questions


1. Qualitative or categorical data describes the object under consideration using a finite set of discrete
classes. Refer to Section Types of Data
2. Structured data is arranged into a prepared repository; semi-structured data is arranged into a
specific repository, such as a database, but includes associated information; and unstructured data
is arranged into a format that allows it to be accessed and processed more easily. Refer to Section
Types of Data
3. Online Analytical Processing (OLAP) refers to a type of software that allows users to examine data
from various database systems at the same time. Refer to Section Introduction to OLTP
4. Artificial intelligence is sometimes referred to as limited or weak AI since it is meant to accomplish
a small or particular purpose. Refer to Section New Trends in Computing
5. Embedded analytics is a digital workplace feature that allows data analysis to take place within a
user’s natural workflow rather than switching to another application. Refer to Section New Trends
in Computing

@ 1.12 POST-UNIT READING MATERIAL

 https://pdfs.semanticscholar.org/d127/c1bf1fb31fe33f054f3bd2dc4f44c0615987.pdf

1.13 TOPICS FOR DISCUSSION FORUMS

 Using various sources, find the applications of OLAP in the real-world and discuss it with your
classmates.

11

You might also like