IEEE Journal Paper DM
IEEE Journal Paper DM
Abstract—In today’s world, data has become one of the most efficiency as well as elimination of silos throughout the data
valuable asset of business. Although the value of data has become system. Moreover, unify practices of data governance and
clear but the management of data remains a challenge. Master also increase quality of the whole data [3]. Data fabric has
Data Management is the immense process with the objective
to provide shared common data definition and reduces data numerous great advantages [4]:
inconsistency. It’s a business-oriented program; the purpose of
which is to make sure that company’s master data is correct and
precise. Nevertheless, enterprises are realizing the advantages of • Data fabric fast tracks the circulation of data between
data fabric as well and in turn, there is considerable rise in various departments of the company at strategic level,
need for data architecture that utilize metadata, AI, machine which facilitates the innovation.
learning and knowledge graphs to integrate and manage data. • It offers one point access and collection of all data
Both Gartner and Forrester believe that to attain swift, scalable regardless of data location or storage, which remove
integration of data, the value gained from data fabric architecture
is worth to investigate. The purpose of this study is to critically information silos.
perform feature based evaluation of the state of the art in terms • Data fabric can connect to any data source using connec-
of master data management capabilities of data fabric to support tors and prepackaged components, hence coding is not
Data Integration, Knowledge Graphs and RDF implementations. required.
This study will not only discuss the architecture of these state • Data fabric supports real time big data and batch as well.
of the art technologies but also highlight the challenges that
currently data analysts and organizations are facing to implement • It facilitates enterprises to include new data sources using
data fabric in terms of master data management. modern technologies and without interrupting existing
data connections or setups so that future proofing the
Index Terms—AI, Data Fabric, Data Integration, Knowledge
Graph, Machine Learning, Master Data Management, MDM, infrastructure of data management.
RDF Implementation • It offers full circle view of all collaboration data, provid-
ing better customer experience.
I. I NTRODUCTION • Reliance on legacy systems arrangements and solutions
is reduced using data fabric.
F OR Modern business data are biggest asset and the
biggest problem as well. It might sound crazy but data
does not play central role in successful data management
• Data fabric facilitates large volume of data, applications,
and sources.
• It works smoothly with current infrastructure and assists
journey. It’s the employees of the company who required
organizations to increasingly include automation in their
interconnecting and stick to short and long term business
complete data administration strategy
objectives. Master data management is requirement for the
• It spotlessly facilitates information swap with the stake-
companies who want to improve the consistency and quality of
holders inside and outside of the company through APIs.
their main assets like products, assets, customers and locations
data etc. According to Gartner, “Master data management
(MDM) is a technology in which company and Information For all these benefits data fabric market’s expected growth
Technology collaborate to make sure that consistency, ac- rate is 22.3% for next seven years and would reach up
curateness, and responsibility of the business’s official join to $4,500 million [1]. Its top priority of the companies today
master data resources [1]. Data fabric support companies to that access of their business data is provided to the users who
link dispensed data sources and provide a joint data in various need it without the problem of time, space and various types
formats like BI, ML and advances analytics. Data fabric of software and data fabric fulfil this need of the company. If
can be implemented in large organizations to link multiple a company wants to gain competitive advantage by fulfilling
types of data types, points and sources because it has built its customer’s requirements in a better way then secure,
in functionality of retrieving data for users [2]. With the efficient and single method for linking, managing, finding
help of semantic knowledge Graphs, managing Metadata and and converting data in blend with other sources are crucial.
Machine Learning (ML) techniques combine data from various Lastly businesses are employing data fabric to upgrade their
types of data and the endpoints. This helps data management current systems and control the power of data in the clouds.
professionals in grouping similar datasets combined together Companies should deploy data fabric because its requirement
and also integrates new data sources into eco system of of the companies to deliver faster data and be efficient to
the business. Through this strategy different aspects of data rapidly react to business and the requirements of the customers
capacity management are automated which leads to higher and data fabric offer all these proficiency’s.
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 2
time and mass-produce latest forecasts related to data • Up to date data management procedures since new
integration and management. sources included
To achieve unified real time integration of the data, data fabric
3-Formulate and Curate Knowledge Graphs requires numerous components, which are mentioned below:
Knowledge based graphs supports data and data analysis 1. Ingestion of Data
leaders to gain business value through improving data using It requires working with all possible formats of the data,
semantics. The knowledge graph’s semantics layer helps to structured and unstructured as well, from different points.[7]
make it easy to understand and intuitive, which results in These sources consist of data streams, various applications,
easy analysis of data for data analysis leaders. It increases the multiple cloud sources, and several database as well. Data
depthness and meaningful usage of data and graphs contents. ingestion should also support stream processing, batch support
Which allows AI/ML algorithms to make use of the informa- and real time as well.
tion for analysis and numerous functioning use cases. The data . 2. Processing of the Data
integration experts and data engineers frequently employee Data processing provide set of tools that assist to perform data
standards of integration and tools to make sure convenient formatting and transformation of data such that it’s analytically
access delivery to and from knowledge graph. It is vital for ready for application of downstream Business Intelligence
seamless adoption of data fabric without any interruption DD tools[8].
Leaders leverage aforementioned. 3. Management and Intelligence of Data
4- A Strong Data Integration Pillar Management and intelligence not only secure the data but
It is vital for data fabric to be fitting with different style of data also implement data governance. It is enforces through deter-
delivery some of which are ETL, Messaging, streaming, data mination mechanism that who can access what data. Global
virtualization, replication and data micro services. It should not Structures like Metadata Administration, search and lineage
only support all kinds of users of the data but also support IT control are also applied here.
Users who need complex integration needs and end business 4. Orchestration of Data
users who require self-service preparation of the data. This component of the data fabric synchronizes the tasks of
all levels in the complete life cycle of data workflow. The
definition of when and how frequently pipelines should be run
is allowed here. Moreover, how to manage the data created by
those pipelines is allowed here.
5. Discovery of Data
Modeling of data, preparation of data and data curating are
performed on this layer. This layer assists data analyst to locate
and consume the data throughout two silos such that it seem
that both are part of same data and obtain valuable perception
from datasets. Aforementioned is undoubtedly the most vital
aspect of the data fabric and its main focus is to solve the silo
problem.
6. Access of Data
Access of data means data is provided to data analyst either
directly or using queries such as Dashboards, API and data
services. Semantics, intelligence and rules about data retrieval
mechanism to deliver the data in requested format and form
are built on this layer.
the needs for a data fabric, and it is actually most suitable moving data ensure security firewalls are in place as well as
to support a data fabric. To support data fabric there are security protocols are active to make sure data is safe from
strict specifications lay out for technologies to offer. These security breaches. As with rising number of cyber attacks on
provisions strictly limit the technical methodology, motivating companies and organizations, data security at all levels and
any solution regarding RDF and RDFS. To implement RDF point is utmost necessary[13].
for data fabric following requirements must meet unification, 7- Data Alignment to Standards
and re usability of data. A knowledge graph can be defined as ”a semantic graph that
integrates information into an ontology” in its most basic form.
Any knowledge graph’s base is its ontologies, which give
III. O PEN R ESEARCH I SSUES
scientific terms in the text their clear meanings and capture
According to Gartner’s prediction by 2024, deployments of the connections between them. Semantic enrichment curates
the data fabric will be four times efficient than the current and unstructured technical script with ontologies, through its con-
human-driven data tasks will be reduced to half. Nevertheless, textualisation it can defines “things, not strings” and can be
data fabric is a complicated architecture and numerous issues comprehended and utilized by computers. Thus, for instance,
need to be resolved in this domain to improve its efficiency. a computer can comprehend that the expression ‘NIDDM’ is
Some of challenges are listed below: not just a random string of alphabetical letters but denotes to
1- Deployment and Configuration Services an sign. It is possible to contextualise unstructured scientific
One challenge to deploying data fabric is deployment and con- writing such that it explains ”things, not strings” and may be
figuration services. . Services require deployment throughout used by computers by curating it with ontologies, also known
the several servers to make sure performance optimization. as semantic enrichment. So, for instance, a computer can
Additionally, services must configure in a particular way to comprehend that the phrase ”NIDDM” refers to an indication
ensure they function together accurately[9]. rather than just a random string of letters. When creating
2- Dependency Management Between Services and extending vocabulary, CENtree also uses machine-learning
One more challenge currently in data fabric deployment is techniques to generate fresh ontological options. However
dependency management between services because services number of ontology administration platform, offers a central
rely on other services to work properly. If one service does not source for ontology administration and allows users to increase
work all dependent services would also not work. Integration VOCabs, administer in-house vocabularies, like compound IDs
of data fabric in current infrastructure of the organization is and study codes, or create new ontologies for fields not yet
very challenging and difficult[10]. covered by a VOCab[14]. However, still there are restricted
3- Data Model Creation and Data Management Navigation amount of existing VOCabs and need to be increased for
Another challenge is design and creation of data model, various domains.
figuring out the ways to store and manage the data and creating 8- Harmonisation of Data
a scalable architecture. Another issue in setup of data fabric The capability to build semantic knowledge graphs is essen-
is how to manage and save the data. Since data is in various tially rely on the capacity to Harmonise or integrates data from
formats and to creating a system that can manage all types numerous points. Nevertheless, a usual problem in community
of data is challenging. Moreover, system must be scalable to and internal technical foundations is that several authors
accommodate more new data. So developing an architecture use diverse labels to define the similar thing. Consequently,
that facilitates data fabric is challenging. Handling high traffic searching for instance, for the Type II Diabetes-related gene,
and availability of system are also current challenges[11]. ABCC8, would omit indications to substitutes for example,
4- Integration with External Systems ‘SUR1’, ‘MRP8’ and ‘ATP-binding cassette, sub-family C,
Integration with external systems is another challenge faced member 8’. Ontologies, however, provide much more than
when implementing data fabric. As there are number of way just data harmonisation. For instance, the notion that Type
to integrate but again this depends on the type of the partic- II Diabetes Mellitus is an endocrine ailment is previously
ulate system involved. Since different types of architectures, wrapped inside the ontology that is employed to enhance
protocols, different ways to interpret protocols and different in the source material. This is because one of the purposes of
data formats of the systems make is challenging to implement ontology is to deliver a shared model of knowledge linked with
data fabric. a certain domain. Once an ailment unit has been synchronized
5- Data Fabric Monitoring and Troubleshooting to a specific ID, for instance, the MeSH ID, and then it become
To identify issues in data fabric or its nodes proper moni- feasible to map it to other depictions of the illness from
toring through a computer is required. Which result into low former ontologies like Experimental Factor Ontology, Online
performance or failure of data fabric sometimes. However, to Mendelian Inheritance in Manor or Systematized Nomencla-
perform effective monitoring complete knowledge of how data ture of Medicine[14]. This allows knowledge discovered in the
fabric works and identification of what should be monitored literature to be improved with further knowledge from further
is required[12]. systematized data sources. For instance to know drugs used to
6- Data Security Threat heal that sign from ChEMBL or to find genes linked with the
An increasing concern for companies is threat to security if sign of interest from OpenTargets. Basically these associations
their data especially during transporting it from one point to specify a ‘springboard’ for more investigation throughout the
another point in data fabric. It is crucial that infrastructure for graph.
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 5
9- Relation Extraction from Data simplify the process to obtain data from numerous sources that
The next task is to extract relationships from the literature fill these platforms. Consequently, KGs deliver some of the key
once the data has been harmonised. Instead of just having features that enable data fabric to achieve this aim. Although
them stated in the same text, this phase seeks to determine there are several challenges to implement data fabric but with
when a precise connection actually occurs between two things. the right implementation it can prove to be powerful tool for
Semantic patterns can be built, or collections of patterns (”bun- organizations of all sizes. Highly successful companies around
dles”), to recognise these linkages. These patterns represent the world are implementing and moving their existing systems
the association between two ideas, for example a gene and to develop data fabric along everything of it.
a drug, in the form Gene-Verb-Drug. Then, extract them as
semantic triples aligned to ontologies from the text using
TExpress. However, some connections are murkier than others.
An excellent illustration of this is the question of whether
the inclusion of a drug and a symptom denotes a course
of healing or a causative link in adverse outcomes. Drugs
can both treat and create headaches; the situation must be
considered. SciBite AI may be used to streamline the process
of supplying the trained models to clients for association
extraction. ML algorithms can be trained with the selected
output from TExpress to assist detect associations in particular
scenarios[14]. In the end, this produces a collection of different
attributes that characterise a connection or association, which
can then be added to and enhance a knowledge graph.
10- Schema Generation
The last factor to take into account is schema generation,
which entails building a high-level meta graph containing all
pertinent items and their connections. Using an initial ”bridg-
ing ontology,” CENtree can be used to build a straightforward
representation that can later be enhanced with additional
ontologies, such as a disease entity filled with data from the
EFO illness categorization[14]. When the schema has been
created, CENtree gives the option to transfer it in a manner
that works best for a specific application to the graph database
of user choice. The schema can be transferred to an RDF
Triplestore, for instance, if DA is creating an organization
graph to store huge, regularized data from throughout the
enterprise that can be recovered through other systems. So
investigation should be conducted to find out which format
is better to transfer the schema so that it may be digested
into better user-friendly marked property graph if the graph
is intended to assist exploratory analytics for the purpose of
target authentication or medication repositioning exercise.
IV. C ONCLUSION
The main objective of data fabric appears to focus on
integration of data while the focus of MDM is data integration
also but it emphasizes more one quality of the data. Data fabric
works with all the data while MDM targets only the master
data. However, more organization’s scope is covered in MDM
instead of data fabric. In conclusion, data fabric with MDM
capabilities is the only viable mechanism for organizations to
attain the high business value from their data and information,
which in turn lead to stronger competitive advantage and high
profitability. An efficient data fabric automates numerous inte-
gration types, makes the data management scalable across the
organization and not only decrease the cost of storage but also
improve the performance. Since data fabric has unique and
symbiotic relationship with knowledge graph as it considerably
DATA MANAGEMENT COURSEWORK, DECEMBER 2022 6
R EFERENCES
[1] Gartner, “Master Data Management (MDM)”Available:
https://www.gartner.com/en/information-technology/glossary/master-
data-management-mdm. 95–112, Jul. 2008.
[2] Kevin Burnely, “Data Fabric – Reimagining Data Management with
Modern Capabilities”Available: https://itchronicles.com/big-data/data-
fabric-reimagining-data-management-with-modern-capabilities/
Jun.2022.
[3] Converge Technology Solutions,“What is a Data Fabric and Why Do
I Need It?”Available: https://convergetp.com/2022/06/16/what-is-a-data-
fabric-and-why-do-i-need-it/ Jun.2022.
[4] Analytics Solutions,“Benefits Of Using Data Fab-
ric”Available: https://www.expressanalytics.com/blog/data-fabric-
benefits/: :text=Advantages20Data20Fabric20OffersJuly.2022.
[5] MICHAEL CASTELLUCCIO,“DATA FABRIC ARCHITECTURE
”Available: https://sfmagazine.com/post-entry/october-2021-data-fabric-
architecture/October.2021.
[6] Ghiran AM., Buchmann R.A.,“The Model-Driven Enterprise Data Fab-
ric: A Proposal Based on Conceptual Modelling and Knowledge Graphs.
In: Douligeris C., Karagiannis D., Apostolou D. (eds) Knowledge Sci-
ence, Engineering and Management. KSEM 2019. Lecture Notes in Com-
puter Science, vol 11775. Springer, Cham ”https://doi.org/10.1007/978-
3-030-29551-6512019.
[7] K2view,“What is a Data Fabric? The Complete Guide -”Available:
https://www.k2view.com/hubfs/Data20Fabric20Brochure.pdfOctober.2021.
[8] The Forrester wave,“Enterprise Data Fabric”Available:
https://info.cambridgesemantics.com/
[9] Favio Vazquez,“The Data Fabric for Machine Learning”Available:
https://www.kdnuggets.com/2019/05/data-fabric-machinelearning-part-
1.html.YZLtNhwafBI.link/
[10] Upside Staff,“Data Digest: Data Fabric and Digital Transforma-
tion”Available:https://tdwi.org/articles/2021/09/07/arch-all-data-fabrics-
0907.aspx
[11] Timothy King,“Data Integration vs. Data Management; What’s the
Difference?”Available:https://solutionsreview.com/data-integration/data-
integration-vs-data-management-whats-the-difference/
[12] JAVIER GARCIA RUBIO,“Data Fabric: The Fabric That Holds
It All Together”Available:https://www.datavirtualizationblog.com/data-
fabric-fabric-holds-all-together/July.2022
[13] AlexSoft,“What is Data Fabric: Architecture, Principles, Advantages,
and Ways to Implementr”Available:https://www.altexsoft.com/blog/data-
fabric/Aug.2022
[14] Joseph Mullen,“Addressing Common Challenges with Knowledge
Graphs”Available:https://www.scibite.com/news/addressing-common-
challenges-with-knowledge-graphs/Mar.2021
V. AUTHOR