0% found this document useful (0 votes)
30 views

IEEE Journal Paper DM

1) Data fabric and master data management (MDM) aim to provide shared and consistent definitions of key data across an organization. Data fabric uses techniques like metadata, machine learning, and knowledge graphs to integrate and manage data from various sources. 2) Data fabric provides many benefits including faster data sharing, one-point access to all data, integration of new data sources without disrupting existing systems, and improved data quality. 3) For successful digital transformation, enterprises are increasingly adopting data fabric approaches to link dispersed data sources and provide unified access to data in various formats for analytics and other uses.

Uploaded by

Zainab Junaid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

IEEE Journal Paper DM

1) Data fabric and master data management (MDM) aim to provide shared and consistent definitions of key data across an organization. Data fabric uses techniques like metadata, machine learning, and knowledge graphs to integrate and manage data from various sources. 2) Data fabric provides many benefits including faster data sharing, one-point access to all data, integration of new data sources without disrupting existing systems, and improved data quality. 3) For successful digital transformation, enterprises are increasingly adopting data fabric approaches to link dispersed data sources and provide unified access to data in various formats for analytics and other uses.

Uploaded by

Zainab Junaid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DATA MANAGEMENT COURSEWORK, DECEMBER 2022 1

DATA FABRIC for MASTER DATA


MANAGEMENT(MDM).
Asifa Junaid Ahmad(4132649), Data Management, London South Bank University

Abstract—In today’s world, data has become one of the most efficiency as well as elimination of silos throughout the data
valuable asset of business. Although the value of data has become system. Moreover, unify practices of data governance and
clear but the management of data remains a challenge. Master also increase quality of the whole data [3]. Data fabric has
Data Management is the immense process with the objective
to provide shared common data definition and reduces data numerous great advantages [4]:
inconsistency. It’s a business-oriented program; the purpose of
which is to make sure that company’s master data is correct and
precise. Nevertheless, enterprises are realizing the advantages of • Data fabric fast tracks the circulation of data between
data fabric as well and in turn, there is considerable rise in various departments of the company at strategic level,
need for data architecture that utilize metadata, AI, machine which facilitates the innovation.
learning and knowledge graphs to integrate and manage data. • It offers one point access and collection of all data
Both Gartner and Forrester believe that to attain swift, scalable regardless of data location or storage, which remove
integration of data, the value gained from data fabric architecture
is worth to investigate. The purpose of this study is to critically information silos.
perform feature based evaluation of the state of the art in terms • Data fabric can connect to any data source using connec-
of master data management capabilities of data fabric to support tors and prepackaged components, hence coding is not
Data Integration, Knowledge Graphs and RDF implementations. required.
This study will not only discuss the architecture of these state • Data fabric supports real time big data and batch as well.
of the art technologies but also highlight the challenges that
currently data analysts and organizations are facing to implement • It facilitates enterprises to include new data sources using
data fabric in terms of master data management. modern technologies and without interrupting existing
data connections or setups so that future proofing the
Index Terms—AI, Data Fabric, Data Integration, Knowledge
Graph, Machine Learning, Master Data Management, MDM, infrastructure of data management.
RDF Implementation • It offers full circle view of all collaboration data, provid-
ing better customer experience.
I. I NTRODUCTION • Reliance on legacy systems arrangements and solutions
is reduced using data fabric.
F OR Modern business data are biggest asset and the
biggest problem as well. It might sound crazy but data
does not play central role in successful data management
• Data fabric facilitates large volume of data, applications,
and sources.
• It works smoothly with current infrastructure and assists
journey. It’s the employees of the company who required
organizations to increasingly include automation in their
interconnecting and stick to short and long term business
complete data administration strategy
objectives. Master data management is requirement for the
• It spotlessly facilitates information swap with the stake-
companies who want to improve the consistency and quality of
holders inside and outside of the company through APIs.
their main assets like products, assets, customers and locations
data etc. According to Gartner, “Master data management
(MDM) is a technology in which company and Information For all these benefits data fabric market’s expected growth
Technology collaborate to make sure that consistency, ac- rate is 22.3% for next seven years and would reach up
curateness, and responsibility of the business’s official join to $4,500 million [1]. Its top priority of the companies today
master data resources [1]. Data fabric support companies to that access of their business data is provided to the users who
link dispensed data sources and provide a joint data in various need it without the problem of time, space and various types
formats like BI, ML and advances analytics. Data fabric of software and data fabric fulfil this need of the company. If
can be implemented in large organizations to link multiple a company wants to gain competitive advantage by fulfilling
types of data types, points and sources because it has built its customer’s requirements in a better way then secure,
in functionality of retrieving data for users [2]. With the efficient and single method for linking, managing, finding
help of semantic knowledge Graphs, managing Metadata and and converting data in blend with other sources are crucial.
Machine Learning (ML) techniques combine data from various Lastly businesses are employing data fabric to upgrade their
types of data and the endpoints. This helps data management current systems and control the power of data in the clouds.
professionals in grouping similar datasets combined together Companies should deploy data fabric because its requirement
and also integrates new data sources into eco system of of the companies to deliver faster data and be efficient to
the business. Through this strategy different aspects of data rapidly react to business and the requirements of the customers
capacity management are automated which leads to higher and data fabric offer all these proficiency’s.
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 2

A. Data Fabric and MDM • Validation of the data


The purpose of MDM and data fabric is same, that all master • Transfer of data across multiple workflows and data
data must be accessible across the organization and in the verification while migration
business eco system. Its crucial for successful digital transfor- Most of the times its legal, IT or business requirement to store
mation of an enterprise. Recently there is vast development data in different places so data integration platform is necessity
in the concept of data management where other information to develop a unified view. Data integration layer in MDM is an
apart from master data is incorporated in some platforms. In architecture design that works at compute layer due to which
the picture below there is evidently a union and commonality data can be connected with each other whether its in the cloud
in data fabric and master data management. or on company premises. On top of this layer is Enterprise
knowledge graph. EKG using semantic graphs not only maps
entities but also their relationship.

Fig. 1. Commonality between MDM and Data Fabric(Image source: mdm-


list.com) Fig. 2. Repository Architecture of Master Data Management (Image Source:
techfunnel.com)

B. IT Companies Innovating Data Fabric


B. Data Fabric
Some of the IT companies innovative in the creation of data
fabric lead are Company’s data comprise of complete information about
its products, assets, customers and its employees. This data
consist of various formats and managed by multiple software
applications, which can be residing in different parts of the
worlds. Access to this data for employees of the company can
be difficult and risky in sense of redundancy. Moreover, there
may be on employee or team has a complete picture of data
of the company and in fact it may be very heard to achieve
such access. Data fabric is solution of these situations because
it can make company data available to all persons who need
it according to their required format with appropriate access
regardless of the fact what is the location of the data. To deliver
business value, data fabric architecture must make sure that
following four key pillars of wide-ranging data fabric are met.
1- Gather and Analyze All Types of Data
Contextual information forms the basis for dynamic design
of data fabric. To enable data fabric to identify, link and
II. S TATE OF THE A RT D ESCRIPTION
analysis of all types of metadata for instance, technical, social,
A. Master Data Management economic and operational there should be some mechanism
Master data management system connects all vital business like integrated pool of metadata.
data into one spot of reference and provide a master record 2- Transform Passive Metadata into Active Metadata
of all data. All data of the organization is linked through a It is crucial for organizations to activate metadata for seamless
common architecture which is full cycle strategy to make sure distribution of data. To implement data fabric should:
that quality of the data is preserved and can by summarised • Graphical representation of metadata to make it easy to
as: understand, using their unique and appropriate to business
• Data gathering from various sources associations.
• Business polices application to conserve quality of the • Employ major metadata metrics to facilitate AI/ML
data Algorithms, these algorithms learn with the passage of
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 3

time and mass-produce latest forecasts related to data • Up to date data management procedures since new
integration and management. sources included
To achieve unified real time integration of the data, data fabric
3-Formulate and Curate Knowledge Graphs requires numerous components, which are mentioned below:
Knowledge based graphs supports data and data analysis 1. Ingestion of Data
leaders to gain business value through improving data using It requires working with all possible formats of the data,
semantics. The knowledge graph’s semantics layer helps to structured and unstructured as well, from different points.[7]
make it easy to understand and intuitive, which results in These sources consist of data streams, various applications,
easy analysis of data for data analysis leaders. It increases the multiple cloud sources, and several database as well. Data
depthness and meaningful usage of data and graphs contents. ingestion should also support stream processing, batch support
Which allows AI/ML algorithms to make use of the informa- and real time as well.
tion for analysis and numerous functioning use cases. The data . 2. Processing of the Data
integration experts and data engineers frequently employee Data processing provide set of tools that assist to perform data
standards of integration and tools to make sure convenient formatting and transformation of data such that it’s analytically
access delivery to and from knowledge graph. It is vital for ready for application of downstream Business Intelligence
seamless adoption of data fabric without any interruption DD tools[8].
Leaders leverage aforementioned. 3. Management and Intelligence of Data
4- A Strong Data Integration Pillar Management and intelligence not only secure the data but
It is vital for data fabric to be fitting with different style of data also implement data governance. It is enforces through deter-
delivery some of which are ETL, Messaging, streaming, data mination mechanism that who can access what data. Global
virtualization, replication and data micro services. It should not Structures like Metadata Administration, search and lineage
only support all kinds of users of the data but also support IT control are also applied here.
Users who need complex integration needs and end business 4. Orchestration of Data
users who require self-service preparation of the data. This component of the data fabric synchronizes the tasks of
all levels in the complete life cycle of data workflow. The
definition of when and how frequently pipelines should be run
is allowed here. Moreover, how to manage the data created by
those pipelines is allowed here.
5. Discovery of Data
Modeling of data, preparation of data and data curating are
performed on this layer. This layer assists data analyst to locate
and consume the data throughout two silos such that it seem
that both are part of same data and obtain valuable perception
from datasets. Aforementioned is undoubtedly the most vital
aspect of the data fabric and its main focus is to solve the silo
problem.
6. Access of Data
Access of data means data is provided to data analyst either
directly or using queries such as Dashboards, API and data
services. Semantics, intelligence and rules about data retrieval
mechanism to deliver the data in requested format and form
are built on this layer.

Fig. 3. Architecture of Data Fabric (Image Source: gartner.com) D. Knowledge Graph


Data fabric has distinctive connection with the knowledge
graphs because they considerably reorganize the procedures to
obtain data from the numerous places that fill these manifestos.
C. Data Integration
Consequently, knowledge graphs deliver some core functional-
Since data fabric rely on automated and AI focused inte- ities make it possible for data fabric to achieve this goal[8]. In
gration that enhance with the passage of time. An efficient turn, it is enormously important that data fabrics are considered
fabric systematizes various integration styles, standardizes data as the utmost developed ways of data harmonization and
management throughout the organization, decreases cost of integration. Data fabric with the power of knowledge graph
data storage and improves performance. The key components technology creates optimum solutions of alignment of all types
of the architecture are given below: of data for any single purpose of business.
• Make easy access of hard to find data in various clouds
and cross environments E. RDF Implementation
• Removes data silos Data fabric built on knowledge based graphs especially RDF
• Removes numerous and manual tools is the only best option. Knowledge graph technology fulfils
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 4

the needs for a data fabric, and it is actually most suitable moving data ensure security firewalls are in place as well as
to support a data fabric. To support data fabric there are security protocols are active to make sure data is safe from
strict specifications lay out for technologies to offer. These security breaches. As with rising number of cyber attacks on
provisions strictly limit the technical methodology, motivating companies and organizations, data security at all levels and
any solution regarding RDF and RDFS. To implement RDF point is utmost necessary[13].
for data fabric following requirements must meet unification, 7- Data Alignment to Standards
and re usability of data. A knowledge graph can be defined as ”a semantic graph that
integrates information into an ontology” in its most basic form.
Any knowledge graph’s base is its ontologies, which give
III. O PEN R ESEARCH I SSUES
scientific terms in the text their clear meanings and capture
According to Gartner’s prediction by 2024, deployments of the connections between them. Semantic enrichment curates
the data fabric will be four times efficient than the current and unstructured technical script with ontologies, through its con-
human-driven data tasks will be reduced to half. Nevertheless, textualisation it can defines “things, not strings” and can be
data fabric is a complicated architecture and numerous issues comprehended and utilized by computers. Thus, for instance,
need to be resolved in this domain to improve its efficiency. a computer can comprehend that the expression ‘NIDDM’ is
Some of challenges are listed below: not just a random string of alphabetical letters but denotes to
1- Deployment and Configuration Services an sign. It is possible to contextualise unstructured scientific
One challenge to deploying data fabric is deployment and con- writing such that it explains ”things, not strings” and may be
figuration services. . Services require deployment throughout used by computers by curating it with ontologies, also known
the several servers to make sure performance optimization. as semantic enrichment. So, for instance, a computer can
Additionally, services must configure in a particular way to comprehend that the phrase ”NIDDM” refers to an indication
ensure they function together accurately[9]. rather than just a random string of letters. When creating
2- Dependency Management Between Services and extending vocabulary, CENtree also uses machine-learning
One more challenge currently in data fabric deployment is techniques to generate fresh ontological options. However
dependency management between services because services number of ontology administration platform, offers a central
rely on other services to work properly. If one service does not source for ontology administration and allows users to increase
work all dependent services would also not work. Integration VOCabs, administer in-house vocabularies, like compound IDs
of data fabric in current infrastructure of the organization is and study codes, or create new ontologies for fields not yet
very challenging and difficult[10]. covered by a VOCab[14]. However, still there are restricted
3- Data Model Creation and Data Management Navigation amount of existing VOCabs and need to be increased for
Another challenge is design and creation of data model, various domains.
figuring out the ways to store and manage the data and creating 8- Harmonisation of Data
a scalable architecture. Another issue in setup of data fabric The capability to build semantic knowledge graphs is essen-
is how to manage and save the data. Since data is in various tially rely on the capacity to Harmonise or integrates data from
formats and to creating a system that can manage all types numerous points. Nevertheless, a usual problem in community
of data is challenging. Moreover, system must be scalable to and internal technical foundations is that several authors
accommodate more new data. So developing an architecture use diverse labels to define the similar thing. Consequently,
that facilitates data fabric is challenging. Handling high traffic searching for instance, for the Type II Diabetes-related gene,
and availability of system are also current challenges[11]. ABCC8, would omit indications to substitutes for example,
4- Integration with External Systems ‘SUR1’, ‘MRP8’ and ‘ATP-binding cassette, sub-family C,
Integration with external systems is another challenge faced member 8’. Ontologies, however, provide much more than
when implementing data fabric. As there are number of way just data harmonisation. For instance, the notion that Type
to integrate but again this depends on the type of the partic- II Diabetes Mellitus is an endocrine ailment is previously
ulate system involved. Since different types of architectures, wrapped inside the ontology that is employed to enhance
protocols, different ways to interpret protocols and different in the source material. This is because one of the purposes of
data formats of the systems make is challenging to implement ontology is to deliver a shared model of knowledge linked with
data fabric. a certain domain. Once an ailment unit has been synchronized
5- Data Fabric Monitoring and Troubleshooting to a specific ID, for instance, the MeSH ID, and then it become
To identify issues in data fabric or its nodes proper moni- feasible to map it to other depictions of the illness from
toring through a computer is required. Which result into low former ontologies like Experimental Factor Ontology, Online
performance or failure of data fabric sometimes. However, to Mendelian Inheritance in Manor or Systematized Nomencla-
perform effective monitoring complete knowledge of how data ture of Medicine[14]. This allows knowledge discovered in the
fabric works and identification of what should be monitored literature to be improved with further knowledge from further
is required[12]. systematized data sources. For instance to know drugs used to
6- Data Security Threat heal that sign from ChEMBL or to find genes linked with the
An increasing concern for companies is threat to security if sign of interest from OpenTargets. Basically these associations
their data especially during transporting it from one point to specify a ‘springboard’ for more investigation throughout the
another point in data fabric. It is crucial that infrastructure for graph.
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 5

9- Relation Extraction from Data simplify the process to obtain data from numerous sources that
The next task is to extract relationships from the literature fill these platforms. Consequently, KGs deliver some of the key
once the data has been harmonised. Instead of just having features that enable data fabric to achieve this aim. Although
them stated in the same text, this phase seeks to determine there are several challenges to implement data fabric but with
when a precise connection actually occurs between two things. the right implementation it can prove to be powerful tool for
Semantic patterns can be built, or collections of patterns (”bun- organizations of all sizes. Highly successful companies around
dles”), to recognise these linkages. These patterns represent the world are implementing and moving their existing systems
the association between two ideas, for example a gene and to develop data fabric along everything of it.
a drug, in the form Gene-Verb-Drug. Then, extract them as
semantic triples aligned to ontologies from the text using
TExpress. However, some connections are murkier than others.
An excellent illustration of this is the question of whether
the inclusion of a drug and a symptom denotes a course
of healing or a causative link in adverse outcomes. Drugs
can both treat and create headaches; the situation must be
considered. SciBite AI may be used to streamline the process
of supplying the trained models to clients for association
extraction. ML algorithms can be trained with the selected
output from TExpress to assist detect associations in particular
scenarios[14]. In the end, this produces a collection of different
attributes that characterise a connection or association, which
can then be added to and enhance a knowledge graph.
10- Schema Generation
The last factor to take into account is schema generation,
which entails building a high-level meta graph containing all
pertinent items and their connections. Using an initial ”bridg-
ing ontology,” CENtree can be used to build a straightforward
representation that can later be enhanced with additional
ontologies, such as a disease entity filled with data from the
EFO illness categorization[14]. When the schema has been
created, CENtree gives the option to transfer it in a manner
that works best for a specific application to the graph database
of user choice. The schema can be transferred to an RDF
Triplestore, for instance, if DA is creating an organization
graph to store huge, regularized data from throughout the
enterprise that can be recovered through other systems. So
investigation should be conducted to find out which format
is better to transfer the schema so that it may be digested
into better user-friendly marked property graph if the graph
is intended to assist exploratory analytics for the purpose of
target authentication or medication repositioning exercise.

IV. C ONCLUSION
The main objective of data fabric appears to focus on
integration of data while the focus of MDM is data integration
also but it emphasizes more one quality of the data. Data fabric
works with all the data while MDM targets only the master
data. However, more organization’s scope is covered in MDM
instead of data fabric. In conclusion, data fabric with MDM
capabilities is the only viable mechanism for organizations to
attain the high business value from their data and information,
which in turn lead to stronger competitive advantage and high
profitability. An efficient data fabric automates numerous inte-
gration types, makes the data management scalable across the
organization and not only decrease the cost of storage but also
improve the performance. Since data fabric has unique and
symbiotic relationship with knowledge graph as it considerably
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 6

R EFERENCES
[1] Gartner, “Master Data Management (MDM)”Available:
https://www.gartner.com/en/information-technology/glossary/master-
data-management-mdm. 95–112, Jul. 2008.
[2] Kevin Burnely, “Data Fabric – Reimagining Data Management with
Modern Capabilities”Available: https://itchronicles.com/big-data/data-
fabric-reimagining-data-management-with-modern-capabilities/
Jun.2022.
[3] Converge Technology Solutions,“What is a Data Fabric and Why Do
I Need It?”Available: https://convergetp.com/2022/06/16/what-is-a-data-
fabric-and-why-do-i-need-it/ Jun.2022.
[4] Analytics Solutions,“Benefits Of Using Data Fab-
ric”Available: https://www.expressanalytics.com/blog/data-fabric-
benefits/: :text=Advantages20Data20Fabric20OffersJuly.2022.
[5] MICHAEL CASTELLUCCIO,“DATA FABRIC ARCHITECTURE
”Available: https://sfmagazine.com/post-entry/october-2021-data-fabric-
architecture/October.2021.
[6] Ghiran AM., Buchmann R.A.,“The Model-Driven Enterprise Data Fab-
ric: A Proposal Based on Conceptual Modelling and Knowledge Graphs.
In: Douligeris C., Karagiannis D., Apostolou D. (eds) Knowledge Sci-
ence, Engineering and Management. KSEM 2019. Lecture Notes in Com-
puter Science, vol 11775. Springer, Cham ”https://doi.org/10.1007/978-
3-030-29551-6512019.
[7] K2view,“What is a Data Fabric? The Complete Guide -”Available:
https://www.k2view.com/hubfs/Data20Fabric20Brochure.pdfOctober.2021.
[8] The Forrester wave,“Enterprise Data Fabric”Available:
https://info.cambridgesemantics.com/
[9] Favio Vazquez,“The Data Fabric for Machine Learning”Available:
https://www.kdnuggets.com/2019/05/data-fabric-machinelearning-part-
1.html.YZLtNhwafBI.link/
[10] Upside Staff,“Data Digest: Data Fabric and Digital Transforma-
tion”Available:https://tdwi.org/articles/2021/09/07/arch-all-data-fabrics-
0907.aspx
[11] Timothy King,“Data Integration vs. Data Management; What’s the
Difference?”Available:https://solutionsreview.com/data-integration/data-
integration-vs-data-management-whats-the-difference/
[12] JAVIER GARCIA RUBIO,“Data Fabric: The Fabric That Holds
It All Together”Available:https://www.datavirtualizationblog.com/data-
fabric-fabric-holds-all-together/July.2022
[13] AlexSoft,“What is Data Fabric: Architecture, Principles, Advantages,
and Ways to Implementr”Available:https://www.altexsoft.com/blog/data-
fabric/Aug.2022
[14] Joseph Mullen,“Addressing Common Challenges with Knowledge
Graphs”Available:https://www.scibite.com/news/addressing-common-
challenges-with-knowledge-graphs/Mar.2021

V. AUTHOR

My name is Asifa Junaid Ahmad,.I am from Pakistan and


currently student of MSc Data Science from London South
Bank University.I have huge interest in Technical witting and
I’m enthusiast learner of data science.

You might also like