Implementing Federated Governance in DataMesh Architecture
Implementing Federated Governance in DataMesh Architecture
Article
Implementing Federated Governance in Data Mesh Architecture †
Anton Dolhopolov , Arnaud Castelltort and Anne Laurent *
Abstract: Analytical data platforms have been used for decades to improve organizational perfor-
mance. Starting from the data warehouses used primarily for structured data processing, through
the data lakes oriented for raw data storage and post-hoc data analyses, to the data lakehouses—a
combination of raw storage and business intelligence pre-processing for improving the platform’s
efficacy. But in recent years, a new architecture called Data Mesh has emerged. The main promise of
this architecture is to remove the barriers between operational and analytical teams in order to boost
the overall value extraction from the big data. A number of attempts have been made to formalize and
implement it in existing projects. Although being defined as a socio-technical paradigm, data mesh
still lacks the technology support to enable its widespread adoption. To overcome this limitation,
we propose a new view of the platform requirements alongside the formal governance definition
that we believe can help in the successful adoption of the data mesh. It is based on fundamental
aspects such as decentralized data domains and federated computational governance. In addition,
we also present a blockchain-based implementation of a mesh platform as a practical validation of
our theoretical proposal. Overall, this article demonstrates a novel research direction for information
system decentralization technologies.
Keywords: decentralized systems; federated governance; metadata management; data mesh; blockchain
speed (velocity) of batch and real-time acquisition directly from the data sources to the
support of a variety of data types, including structured and unstructured information.
The synergy of these two platforms was introduced in [4]. Data lakehouses preserve
the benefits of both solutions. On one side, analytical data pre-computation enables fast
business queries (e.g., business intelligence cubes); on the other side, raw data storage
enables the discovery of valuable data insights in the future.
Even with the strong support of distributed technologies, all these three kinds of
platforms are designed and developed as centralized solutions from a logical point of view.
The centralization comes from the fact that usually, data value extraction is performed by
analytical teams that are detached from the operational activities of the company. Data
scientists and engineers are gathered under a single data division, tasked with performance
evaluation, and serve as a mediator between the business people and developers.
Nonetheless, with the growing popularity of decentralized systems in recent years,
a new data platform architecture called Data Mesh [5] has emerged. Building on the
lasting works around the domain-driven design [6] and data markets [7], it manifests
four core principles:
• Distributed data domains;
• Data-as-a-product;
• Self-serve data infrastructure;
• Federated computational governance.
In contrast to the previous platform types that envision the centralization of the
data ownership and value extractions under the command of a dedicated team, the mesh
architecture aims to remove the segmentation of the holistic business domains across the
operational and analytical teams.
The core component of any big data platform is a metadata management (MDM)
system. This makes it possible to account for, link, comprehend, integrate, and control the
existing data assets. MDM can be seen as an essential part of the federated governance [8].
From a high-level perspective, we can split the last data mesh principle into two modules:
1. Definition, access, and collaboration on schemes, semantic knowledge, lineage, etc.;
2. Tools to automatically enforce security, transparency, legal policies, etc.
We see that the first part (governance) is well-aligned with metadata management,
while the second part (computability) represents the infrastructure platform functions.
However, the main challenge appears when building the federated computational
governance with the existing products or research tools. The data mesh platforms either fall
back to using the centralized systems, like Apache Atlas [9,10], cloud data catalogs [11,12];
or represent only a conceptual architecture of the federated MDM system [13–15].
In this article, we demonstrate the further development of our previous metadata
system described in [16]. At first, we consider the problem domain in more detail by
showcasing a virtual company running example. Then, we outline the MDM properties
and challenges of the mesh platform governance. On the way to overcome the limitations,
we show how the operating system level virtualization (e.g., Kubernetes) and blockchain
technologies fulfill those requirements. As a final part, we present our platform prototype,
which aims to implement federated computational governance. The article concludes with
a system evaluation and future research directions.
2. Background
In this section, we recall the data mesh principles with more descriptive details and
provide a running example of a virtual company to illustrate their application. We also
walk through the existing related research and proceed with the requirements of data
mesh governance.
Future Internet 2024, 16, 115 3 of 17
2.1.2. Data-as-a-Product
The domain-driven methodology uses the notion of the software products [6]. In data
mesh, this means that the culture of building the data products within the domains is re-
quired. Most of the time, the data users are not only the external customers, but other
domain teams within a single organization too. Making a reusable data product requires
designing, delivering, measuring metrics, and constantly improving the provided data [12].
As a consequence, it also results in a new zone of responsibility for data product own-
ers: SLA guarantees, trustworthy data delivery, tools for data discovery, comprehension,
consumption, etc.
ucts, which are actual or derived data, like business intelligence reports with aggregated
information on regional supply chain state or predicted raw material delivery delays.
By consuming those reports, analysts of the Sales department will improve the accu-
racy of its own forecasts, like adjusting the expected revenue of the company.
The shared infrastructure platform provides a comfortable interface over the utilized
computing resources. This helps to put in place the monitoring of the newly released reports
and to automate the updates of the product prices according to the selected policies. In
general, the common platform assures interoperability and helps to avoid over-provisioning
of resources when working with different technological stacks.
The federated computational governance is an essential component for “maintaining
a dynamic equilibrium between domain autonomy and global interoperability” [5]. Here,
governance is a collection of policies, standards, objectives, metrics, and roles aimed to
maximize the value of data, and computational property ensures its automatic enforcement.
Recalling the metadata management system as a core part of the governance, we also
provide a specialized Platform Asset Catalog that enlists asset metadata such as ownership,
purpose, history, dependencies, mode of use, etc. Since there are a lot of different product
sources and configurable computing tasks that are produced by different teams, it is a good
practice to register them in such a catalog.
Among other cloud-based works, we note Butte et Butte [12], AWS-based proposal
with zonal data segmentation, and Goedegebuure et al. [11] who conducted the grey
literature review on data mesh and also presented a Microsoft Azure-based architecture
implementation for a large producer of lithography machines.
Hooshmand et al. [13] presented a novel view of an automobile production industry
with the application of semantic web technologies (SWTs) for ensuring the integration and
alignment of distributed domains, while Driessen et al. [21] described the usage of SWT
specifically for data product metadata management.
Nonetheless, as we shall see in Section 6, the existing platforms for building the data
mesh still do not satisfy all requirements. We provide these requirements in the next section.
important to keep records of product access, user identities, usage patterns, locations,
time schedules, etc., in order to detect and prevent unauthorized activities.
7. Computational Policies (CP) play a vital role in automatic governance execution.
Beyond the access control enforcement, it also enables data quality verification, consis-
tency, uniqueness, lifecycle management, service contract tests, etc [5]. This reflects the
need to define the rules on each level—global and local, which are then applied in the
mesh. Such governance execution also requires an appropriate platform infrastructure
as well [10].
8. In the context of micro-service architecture, Independently Deployable (ID) products
provide the opportunity to update the performed services without the overall system
interruption (e.g., canary deployment) [17]. In the context of data mesh, it means
an option to deploy the new data products without affecting the established data
consumption. This requirement is also applied to metadata registration and policy
updates. Ideally, the new information and rules should not interrupt the existing data
flows unless it is specifically intended by the domain owners.
9. Automatically Testable (AT) platform design ensures the correctness of the future
system state upon the module’s upgrade. For instance, when implementing new data
resolution modules, e.g., transition from IPv4 to IPv6, the address space and links of
the old resources should continue to work. To be sure that the introduction of a new
module will not break the operations of the system, we are obligated to perform the
automatic tests and verification of the system, assuming that the upgrades have taken
their place, while in reality keeping the old system functioning.
10. Contract Management (CM) provides a way to negotiate, participate, and ensure
the correctness of the delivered data. Usually, the contract includes the outlined
service level objectives and agreements, including the quality of data, schema, update
frequency, intended purposes of use, etc. [24]. As a part of the platform governance, it
overlaps with the metadata management and computational execution modules.
11. Product Compositionality (PC) helps to speed up product development and to pre-
vent dataflow interruption. Automatic contract composition verification enables
advanced interoperability features and helps to prevent unauthorized behavior, recov-
ery of PII, etc. In cases of schema composition, it automatically enriches the existing
products or prevents the introduction of the breaking changes.
These properties represent the major functional modules of the data mesh. Despite
the fact that it was outlined in 2019, there are still challenges in building the technology
that can underpin its implementation.
knowledge, and providing data lineage. It also strives to automate the enforcement of
security, transparency, and legal standards. Nevertheless, the challenge lies in building
such a system with the currently available tools and research.
In our introductory section, we noted that metadata management systems often
serve the roles of data governance systems in research environments. Nonetheless, in the
literature [25–29], and in the commercial products [30–32] we see the predominant adoption
of a centralized governance model. This model involves gathering metadata in a central
repository, where it is analyzed, interconnected retrospectively, and often made accessible
to users through web portals.
Recent studies have explored the implementation of federated data governance using
semantic web technologies [13,21], though these systems are still in their developing stages.
One of the approaches to making the federated governance, which is getting more
attention in the data mesh community is “data contracts”. Truong et al. [24] highlight
the need for data contracts in managing the use, sharing, and transaction of data within
cloud marketplaces. This provides a structured method to define the rights, obligations,
and constraints associated with data usage, ensuring clarity and compliance across different
parties. They address key challenges such as specifying permissible data use, establishing
liability in case of data misuse or quality issues, and enabling automatic and efficient data
exchange and utilization. Furthermore, data contracts contribute to the scalability of data
marketplaces by automating compliance and governance processes. By formalizing these
aspects, we facilitate a trustworthy and regulated environment for data sharing, promoting
innovation and value creation within data ecosystems.
Consequently, there is a significant research gap in the design and development of
federated governance systems, signaling a pressing need for advancement in this field.
In the next section, we attempt to provide the formal model for building the federated
metadata management system while using the data contract approach.
can be the case that n given domains can configure a shared federated metadata system,
but also have their own private systems in parallel.
Therefore, the decentralized metadata repository ∆ is defined as:
• ∆ = {R1 , R2 . . .} with Ri being the repository associated with Di ⊂ D and Di ̸= ∅
• D = {d1 , d2 . . .} is a set of all data domains
• li,j : Mi × M j → {1, 0} is a function establishing the link presence or absence be-
tween a pair of metadata records (mi , m j ) that belong to different repositories Ri and
R j , respectively.
Platform governance is managed through detailed policy mechanisms that are related
to the network itself, its channels, or smart contracts. It outlines how network changes
are agreed upon and implemented. This structured approach facilitates a collaborative
and secure environment for managing consensus and implementing updates across the
blockchain network.
HLF uses Membership Service Providers to authenticate the identity of network
components—ranging from organizations to nodes and applications [35]. This authenti-
cation framework supports secure, private communications across distinct channels and
enables the private exchange of data, ensuring that sensitive information is shared only
with authorized parties.
Within the Fabric, the use of smart contracts helps to automate the enforcement of
agreed-upon rules that involve specific assets, e.g., data sharing operations [37]. Smart
contracts are part of a broader construct known as chaincode, which includes both the
contracts and the policies governing their execution. Deployed as compact Docker or Ku-
bernetes containers, chaincode units are the foundational software modules for transaction
endorsement and policy enforcement. These policies can, for example, designate which
network nodes are required to endorse or order a transaction, ensuring that execution
aligns with agreed-upon standards.
Chaincode serves a dual purpose: it not only governs transactions on the ledger
transparently and interoperably but also standardizes data modifications while continu-
ously verifying metadata integrity. This capability is especially relevant for Type II catalog
requirements, where maintaining data accuracy and trustworthiness is crucial. Chain-
code execution can also trigger event notifications, allowing network participants to stay
informed about ledger updates and newly available information.
HLF stands out for its developer-friendly approach to smart contract development,
supporting popular programming languages like Java, Go, and JavaScript/TypeScript [38].
This flexibility contrasts with platforms like Ethereum and lowers the barrier to adoption,
making it easier for developers to build and deploy chaincode on the Fabric network.
On the other hand, it lacks the “contract factory” support which is present in the
Ethereum ecosystem [34]. By introducing such functionality, we could significantly enhance
the capabilities of the system. For instance, by utilizing such a factory we could use the
smart contract template and automatically create the computable data contract instances.
and controlled access to data, regardless of where the data resides—on-premises, in the
cloud, or across multiple clouds. It uses fine-grained access control and data masking
techniques to protect sensitive information.
Fybrik facilitates the movement and virtualization of data across different environ-
ments without the need to duplicate data. This capability supports efficient data manage-
ment and reduces latency by bringing the computer closer to the data, thereby optimizing
performance for data-intensive applications.
policy enforcement points (such as data masking), and selecting the optimal data storage
and computing locations to satisfy performance, cost, and compliance considerations.
Within Fybrik, the policy verification mechanism is based on the Policy Manager
Connector. This manager ensures that data access and usage comply with the organization’s
policies and regulatory requirements. It acts as an intermediary between Fybrik and various
policy management systems (e.g., chaincodes), enabling Fybrik to understand and enforce
the governance rules applied to data operations. In order to use our smart policies we
implement the required getPoliciesDecisions endpoint inside our connector.
When the user code (e.g., notebook) requests access to a dataset through a FybrikAp-
plication resource, Fybrik uses the policy connector to query the relevant chaincode for any
policy decisions that apply to the requested operation. During this request, it transmits the
context information such as data identifiers and intended asset use. Then smart policies
process the request by comparing it to the defined rules (global standards or data contract),
write the decisions to the ledger, and return the actions that the Fybrik platform has to
apply (e.g., data masking).
Within the Fybrik platform, the data catalogs are represented as plugin components. It
uses the Data Catalog Connector to query these catalogs to retrieve metadata information
about datasets. Therefore, it is possible to provide the custom catalog implementation.
For being operable inside Fybrik, we define the connector app which supports four required
operations over the metadata:
• createAsset is used for registering a new product, e.g when the user executes the
notebook and the corresponding workload persists in the data;
• getAssetInfo returns the metadata information which is used for product discovery,
workload processing, etc;
• updateAsset function updates any existing metadata records within the catalog;
• deleteAsset is used for deleting the metadata about the product.
Our metadata catalog and policy manager are based on the previously developed
blockchain-powered metadata system [16]. The actual prototype was enhanced to also
include policy management. The main components of our HLF-based governance sys-
tem include the membership service provider with a certificate authority (CA) used for
identifying and authorizing the parties, endorsement and ordering nodes for transaction
processing (metadata, policy, or data contract processing), and channel configurations used
for listing the participating domain and their rights.
In the following, we provide an example of using the Hyperledger Fabric system.
1. Request the data asset access, e.g., based on the protocol outlined in [16];
2. When the request is approved, the provider’s metadata and policy are used to form a
new data contract;
3. Submit the Notebook and FrybrikApplication documents for provisioning a new
computing process;
4. Register a new data product in the catalog by providing the asset’s metadata and policies.
In this section, we described the architecture of our prototype system for building the
federated data mesh governance. In the next section, we proceed by comparing our system
and other related works based on the functional maturity requirements.
6. Contribution Discussion
Based on the provided requirements in Section 3, in Table 1 we have summarized the
level of the functional maturity of related platforms from the governance perspective.
We see that no existing solution implements all the 11 requirements of the federated
data mesh governance. As anticipated, almost all platforms have the data indexing and
usage tracking modules, except the manual, web-form-based access control described in [9].
Indeed, these modules are built within any data platform architecture as they represent
paramount importance in data management.
Semantic enrichment is a more difficult task than indexing or user authorization, thus
leading to lower implementation rates. While our prototype and some platforms [9,10,19]
(an extended description of the Netflix platform was accessed at: https://netflixtechblog.
com/data-movement-in-netflix-studio-via-data-mesh-3fddcceb1059(accessed on 16 Febru-
ary 2024)) use the basic tag-based approach, the others use the advanced methods of
knowledge graphs [13,21] or cloud-offered services for data similarity computation and
enrichment [11,12]. This tendency is also reflected in the link generation aspect, where the
most common technique is the lineage tracing of the data products, but some systems also
provide the semantic relationship discovery and/or clustering options [11–13,21].
Unfortunately, very little information is provided on the versioning as part of the
metadata management. Only our system enables metadata versioning based immutable
ledger, while some research works [13,21] briefly mention this important feature.
It comes as no surprise that the polyglot data is represented only in the port-oriented
data product models. This system design uses the input/output port connection abstraction
in the first place and assumes the delivery of the identical product value by means of
multiple channels (e.g., Message Queues, BLOBs).
Our model also uses an input/output port connection approach, but unfortunately,
the current level of integration with other platforms in Fybrik is very limited. The Fybrik
platform is quite recent, and today it is integrated with a few data storage providers
(e.g., AWS S3, MySQL, Postgres) and processing engines (Apache Arrow Flight).
Even though modern companies adopt the responsibility of separations and container-
ization technology for ensuring the independent deployment of operational resources, it
Future Internet 2024, 16, 115 15 of 17
is still an issue to bring this functionality to the governance tooling. Most systems either
deploy it as a monolith module or it is under the management of a service provider. Since
both parts of our system are based on Kubernetes, we are able to deploy any platform
component independently—from computing nodes to smart policies.
The implementation of the automated policies is conducted as part of the access
control modules (e.g., authorization enforcement) while the more general notion of the
computational policies is not available. In some works, the process of instantiation of the
composable products is still limited to the schema extensions [19,21].
When evaluating our system, we see that it satisfies 9 out of 11 requirements, which is
greater than any other system. The distinct features of our prototype include the support of
data contract management and enforcement based on the blockchain platform and partial
support of the automatic governance system testing. This testing may be performed either
during the gradual smart contract upgrades or in the mirrored testing networks so that any
breaking changes would be refused.
Butte Goedegebu-
Our Zalando Netflix Machado Wider Driessen Hooshmand
Property ⇓ / Source ⇒ et Butte ure et al.
System [19] [19] et al. [9] et al. [10] et al. [21] et al. [13]
[12] [11]
Data Indexing ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Usage Tracking ✓ ✓ ✓ ✧ ✓ ✓ ✓ ✓ ✓
Semantic Enrichment ✧ ✗ ✧ ✧ ✧ ✓ ✓ ✓ ✓
Link Generation ✧ ✗ ✧ ✧ ✧ ✓ ✓ ✓ ✓
Data Polymorphism ✗ ✗ ✗ ✗ ✓ ✓ ✓ ✗ ✓
Data Versioning ✓ ✗ ✗ ✗ ✗ ✗ ✧ ✧ ✗
Independently Deployable ✓ ✗ ✗ ✗ ✗ - ✓ ✓ -
Computational Policies ✧ ✗ ✗ ✗ ✧ ✧ ✗ ✗ ✧
Composable Products ✗ ✗ ✧ ✗ ✗ ✗ ✧ ✗ ✗
Automatically Testable ✧ ✗ ✗ ✗ ✗ - ✗ ✗ -
Contract Management ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
Total (9)/11 2/11 (5)/11 (4)/11 (6)/11 (6)/11 (8)/11 (6)/11 (6)/11
✗—no information available or not supported ✧—partially supported ✓—supported.
contract factory patterns. Hence, we want to investigate the impact of different blockchain
platforms on performance and security and try to enhance data privacy and compliance in
decentralized systems.
References
1. Miloslavskaya, N.; Tolstoy, A. Big data, fast data and data lake concepts. In Proceedings of the 7th Annual International Conference
on Biologically Inspired Cognitive Architectures (BICA 2016), Procedia Computer Science, New York, NY, USA, 16–19 July 2016.
2. Inmon, W.; Strauss, D.; Neushloss, G. DW 2.0: The Architecture for the Next Generation of Data Warehousing; Elsevier: Amsterdam,
The Netherlands, 2010.
3. Madera, C.; Laurent, A. The next information architecture evolution: The data lake wave. In Proceedings of the 8th International
Conference on Management of Digital Ecosystems, Hendaye, France, 2–4 November 2016; pp. 174–180.
4. Armbrust, M.; Ghodsi, A.; Xin, R.; Zaharia, M. Lakehouse: A new generation of open platforms that unify data warehousing and
advanced analytics. In Proceedings of the CIDR, Virtual Event, 11–15 January 2021; Volume 8.
5. Dehghani, Z. Data Mesh: Delivering Data-Driven Value at Scale; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022.
6. Evans, E.; Evans, E.J. Domain-Driven Design: Tackling Complexity in the Heart of Software; Addison-Wesley Professional: Boston,
MA, USA, 2004.
7. Driessen, S.W.; Monsieur, G.; Van Den Heuvel, W.J. Data market design: A systematic literature review. IEEE Access 2022,
10, 33123–33153. [CrossRef]
8. DAMA-International. DAMA-DMBOK: Data Management Body of Knowledge; Technics Publications: Sedona, AZ, USA, 2017.
9. Araújo Machado, I.; Costa, C.; Santos, M.Y. Advancing Data Architectures with Data Mesh Implementations. In Proceedings of
the International Conference on Advanced Information Systems Engineering, Leuven, Belgium, 6–10 June 2022; Springer: Cham,
Switzerland, 2022; pp. 10–18.
10. Wider, A.; Verma, S.; Akhtar, A. Decentralized data governance as part of a data mesh platform: Concepts and approaches. In
Proceedings of the 2023 IEEE International Conference on Web Services (ICWS), Chicago, IL, USA, 2–8 July 2023; IEEE: Piscataway,
NJ, USA, 2023; pp. 746–754.
11. Abel, G. Data Mesh: Systematic Gray Literature Study, Reference Architecture, and Cloud-Based Instantiation at ASML. Master’s
Thesis, School of Economics and Management, Tilburg University, Tilburg, The Netherlands, 2022.
12. Butte, V.K.; Butte, S. Enterprise Data Strategy: A Decentralized Data Mesh Approach. In Proceedings of the 2022 International
Conference on Data Analytics for Business and Industry (ICDABI), Virtual, 25–26 October 2022; IEEE: Piscataway, NJ, USA, 2022;
pp. 62–66.
13. Hooshmand, Y.; Resch, J.; Wischnewski, P.; Patil, P. From a Monolithic PLM Landscape to a Federated Domain and Data Mesh.
Proc. Des. Soc. 2022, 2, 713–722. [CrossRef]
14. Dolhopolov, A.; Castelltort, A.; Laurent, A. Exploring the Benefits of Blockchain-Powered Metadata Catalogs in Data Mesh
Architecture. In Proceedings of the 15th International Conference on Management of Digital EcoSystems, Crete, Greece, 5–7 May
2023; Springer: Cham, Switzerland, 2023.
15. Dolhopolov, A.; Castelltort, A.; Laurent, A. Trick or Treat: Centralized Data Lake vs Decentralized Data Mesh. In Proceedings
of the 15th International Conference on Management of Digital EcoSystems, Crete, Greece, 5–7 May 2023; Springer: Cham,
Switzerland, 2023.
16. Dolhopolov, A.; Castelltort, A.; Laurent, A. Implementing a Blockchain-Powered Metadata Catalog in Data Mesh Architecture.
In Proceedings of the International Congress on Blockchain and Applications, Guimarães, Portugal, 12–14 July 2023; Springer:
Cham, Switzerland, 2023; pp. 348–360.
17. Newman, S. Building Microservices; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2015.
18. Priebe, T.; Neumaier, S.; Markus, S. Finding your way through the jungle of big data architectures. In Proceedings of the 2021
IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: Piscataway, NJ, USA, 2021;
pp. 5994–5996.
19. Machado, I.A.; Costa, C.; Santos, M.Y. Data mesh: Concepts and principles of a paradigm shift in data architectures. Procedia
Comput. Sci. 2022, 196, 263–271. [CrossRef]
Future Internet 2024, 16, 115 17 of 17
20. Machado, I.; Costa, C.; Santos, M.Y. Data-driven information systems: The data mesh paradigm shift. In Information Systems
Development: Crossing Boundaries between Development and Operations (DevOps) in Information Systems (ISD2021 Proceedings); Insfran,
E., González, F., Abrahão, S., Fernández, M., Barry, C., Linger, H., Lang, M., Schneider, C., Eds.; Universitat Politècnica de
València: Valencia, Spain, 2021.
21. Driessen, S.; Monsieur, G.; van den Heuvel, W.J. Data Product Metadata Management: An Industrial Perspective. In Proceedings
of the Service-Oriented Computing–ICSOC 2022 Workshops: ASOCA, AI-PA, FMCIoT, WESOACS 2022, Sevilla, Spain, 29
November–2 December 2022; Springer: Cham, Switzerland, 2023; pp. 237–248.
22. Sawadogo, P.; Darmont, J. On data lake architectures and metadata management. J. Intell. Inf. Syst. 2021, 56, 97–120. [CrossRef]
23. Stafford, V. Zero trust architecture. NIST Spec. Publ. 2020, 800, 207.
24. Truong, H.L.; Comerio, M.; De Paoli, F.; Gangadharan, G.; Dustdar, S. Data contracts for cloud-based data marketplaces. Int. J.
Comput. Sci. Eng. 2012, 7, 280–295. [CrossRef]
25. Hai, R.; Geisler, S.; Quix, C. Constance: An intelligent data lake system. In Proceedings of the International Conference on
Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; ACM Digital Library: New York, NY, USA, 2016.
26. Zhao, Y. Metadata Management for Data Lake Governance. Ph.D. Thesis, École Doctorale Mathématiques, Informatique et
Télécommunications, Toulouse, France, 2021.
27. Sawadogo, P.N.; Darmont, J.; Noûs, C. Joint Management and Analysis of Textual Documents and Tabular Data within the
AUDAL Data Lake. In Proceedings of the European Conference on Advances in Databases and Information Systems, Tartu,
Estonia, 24–26 August 2021; Springer: Cham, Switzerland, 2021; pp. 88–101.
28. Eichler, R.; Giebler, C.; Gröger, C.; Schwarz, H.; Mitschang, B. Modeling metadata in data lakes—A generic model. Data Knowl.
Eng. 2021, 136, 101931. [CrossRef]
29. Mehmood, H.; Gilman, E.; Cortes, M.; Kostakos, P.; Byrne, A.; Valta, K.; Tekes, S.; Riekki, J. Implementing big data lake for
heterogeneous data sources. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering Workshops
(ICDEW), Macao, China, 8–12 April 2019; IEEE: Piscataway, NJ, USA, 2019.
30. Halevy, A.Y.; Korn, F.; Noy, N.F.; Olston, C.; Polyzotis, N.; Roy, S.; Whang, S.E. Managing Google’s data lake: An overview of the
Goods system. IEEE Data Eng. Bull. 2016, 39, 5–14.
31. Apache Software Foundation. Apache Atlas—Data Governance and Metadata Framework for Hadoop. Available online:
https://atlas.apache.org (accessed on 14 August 2023).
32. DataHub Project. The Metadata Platform for the Modern Data Stack. Available online: https://datahubproject.io/ (accessed on
14 August 2023).
33. Abbas, A.E.; Agahari, W.; Van de Ven, M.; Zuiderwijk, A.; De Reuver, M. Business data sharing through data marketplaces: A
systematic literature review. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 3321–3339. [CrossRef]
34. Desai, H.; Liu, K.; Kantarcioglu, M.; Kagal, L. Adjudicating violations in data sharing agreements using smart contracts.
In Proceedings of the 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and
Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData),
Halifax, NS, Canada, 30 July–3 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1553–1560.
35. Androulaki, E.; Barger, A.; Bortnikov, V.; Cachin, C.; Christidis, K.; De Caro, A.; Enyeart, D.; Ferris, C.; Laventman, G.; Manevich,
Y.; et al. Hyperledger fabric: A distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth
EuroSys Conference, Porto, Portugal, 23–26 April 2018; pp. 1–15.
36. Demichev, A.; Kryukov, A.; Prikhodko, N. The approach to managing provenance metadata and data access rights in distributed
storage using the hyperledger blockchain platform. In Proceedings of the Ivannikov Ispras Open Conference, Moscow, Russia,
22–23 November 2018; IEEE: Piscataway, NJ, USA, 2018.
37. Koscina, M.; Manset, D.; Negri-Ribalta, C.; Perez, O. Enabling trust in healthcare data exchange with a federated blockchain-based
architecture. In Proceedings of the International Conference on Web Intelligence—Companion Volume, Thessaloniki, Greece,
14–17 October 2019.
38. Valenta, M.; Sandner, P. Comparison of ethereum, hyperledger fabric and corda. Frankf. Sch. Blockchain Cent. 2017, 8, 1–8.
39. Ayed, D.; Dragan, P.A.; Félix, E.; Mann, Z.A.; Salant, E.; Seidl, R.; Sidiropoulos, A.; Taylor, S.; Vitorino, R. Protecting sensitive data
in the cloud-to-edge continuum: The FogProtect approach. In Proceedings of the 2022 22nd IEEE International Symposium on
Cluster, Cloud and Internet Computing (CCGrid), Taormina, Italy, 16–19 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 279–288.
40. Dittmann, G.; Giblin, C.; Osborne, M. Automating privacy compliance in the decentralized enterprise. In Proceedings of the IEEE
International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2218–2223.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright of Future Internet is the property of MDPI and its content may not be copied or
emailed to multiple sites or posted to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for individual use.