Introduction

Overview

Composite databases are introduced in Neo4j 5 as an implementation of the Fabric technology. As such, in Neo4j 5, Composite databases replace what is known as Fabric in Neo4j 4.x. For more information, see Composite databases in version 5 of the Neo4j Operations Manual.

Fabric, introduced in Neo4j 4.0, is a way to store and retrieve data in multiple databases, whether they are on the same Neo4j DBMS or in multiple DBMSs, using a single Cypher query. Fabric achieves a number of desirable objectives:

a unified view of local and distributed data, accessible via a single client connection and user session
increased scalability for read/write operations, data volume and concurrency
predictable response time for queries executed during normal operations, a failover or other infrastructure changes
High Availability and No Single Point of Failure for large data volume.

In practical terms, Fabric provides the infrastructure and tooling for:

Data Federation: the ability to access data available in distributed sources in the form of disjointed graphs.
Data Sharding: the ability to access data available in distributed sources in the form of a common graph partitioned on multiple databases.

With Fabric, a Cypher query can store and retrieve data in multiple federated and sharded graphs.

If you want to connect a single database to your DBMS that is shared over all instances of your DBMS, consider using Remote Database Aliases instead.

Fabric concepts

The fabric database

A Fabric setup includes a Fabric virtual database, which acts as the entry point to a federated or sharded graph infrastructure. This database is the execution context in which multi-graph queries can be executed. Drivers and client applications access and use the Fabric execution context by naming it as the selected database for a session. For more information, see Databases and execution context in the Neo4j Driver manuals.

The Fabric virtual database (execution context) differs from normal databases in that it cannot store any data, and only relays data stored elsewhere. The Fabric virtual database can be configured on a standalone Neo4j DBMS only, i.e. on a Neo4j DBMS where the configuration setting dbms.mode must be set to SINGLE.

The Neo4j Admin commands cannot be applied to the Fabric virtual database. They must be run directly on the databases that are part of the Fabric setup.

Fabric graphs

In a Fabric virtual database, data is organized in the form of graphs. Graphs are seen by client applications as local logical structures, where physically data is stored in one or more databases. Databases accessed as Fabric graphs can be local, i.e in the same Neo4j DBMS, or they can be located in external Neo4j DBMSes. The databases are also accessible by client applications from regular local connections in their respective Neo4j DBMSs.

Deployment examples

Fabric constitutes an extremely versatile environment that provides scalability and availability with no single point of failure in various topologies. Users and developers may use applications that can work on a standalone DBMS as well on a very complex and largely distributed infrastructure without the need to apply any change to the queries accessing the Fabric graphs.

Development deployment

In its simplest deployment, Fabric can be used on a single instance, where Fabric graphs are associated to local databases. This approach is commonly used by software developers to create applications that will be deployed on multiple Neo4j DBMSs, or by power users who intend to execute Cypher queries against local disjoint graphs.

Figure 1. Fabric deployment in a single instance

Cluster deployment with no single point of failure

In this deployment Fabric guarantees access to disjoint graphs in high availability with no single point of failure. High availability is reached by creating redundant entry points for the Fabric Database (i.e. two standalone Neo4j DBMSs with the same Fabric configuration) and a minimum Causal Cluster of three members for data storage and retrieval. This approach is suitable for production environments and it can be used by power users who intend to execute Cypher queries against disjoint graphs.

Figure 2. Fabric deployment with no single point of failure

Multi-cluster deployment

In this deployment Fabric provides high scalability and availability with no single point of failure. Disjoint clusters can be sized according to the expected workload and Databases may be colocated in the same cluster or they can be hosted in their own cluster to provide higher throughput. This approach is suitable for production environments where database can be sharded, federated or a combination of the two.

Figure 3. Fabric deployment for scalability with no single point of failure