TigerGraph Buyers Guide Part 1
TigerGraph Buyers Guide Part 1
Graph Databases
Key Considerations in Buying a Graph Database
PART ONE
SELECTING A GRAPH DATABASE FOR ON-PREMISE OR CLOUD
DEPLOYMENT - PART 1
Graph databases are the fastest growing category in all of data management. Since seeing early adoption
by companies including Facebook, Google and LinkedIn, graph has evolved into a mainstream technology
used today by enterprises in every industry across a wide variety of use cases. By organizing data in a
graph format, graph databases overcome the big and complex data challenges that other databases such
as Relational and NoSQL cannot.
Selecting graph software is an important decision which can shape the success of your organization.
Unfortunately buyers often struggle to reconcile the conflicting claims made by different graph software
vendors - these claims are often characterized by misinformation.
Part one of this two-part guide is intended to assist you in your buying decision by providing a side-by-side
comparison of two leading graph databases, Neo4j and TigerGraph. You can also download Part 2, which
compares Amazon Neptune and TigerGraph Cloud.
Part 1 - TigerGraph and Neo4j
Graph is quite common as foundation and enabler in the analytics world. Business people are asking
increasingly complex questions across structured and unstructured data - it often requires blending of
data from multiple sources, multiple business units, and increasingly external data.
Analyzing this at scale is not practical, and in some cases, not possible with traditional database systems.
Graph analysis shows and analyzes the relationships in the data. Processing and computation of the data
requires a distributed, scalable system that can run on the cloud.
Speed Traverses 10M+ entities and relationships per sec- 10 times to 1000 times slower in independent
ond per machine and 100K+ updates per second tests.
per machine.
Scale-out A true distributed database, with automatic parti- Workaround by manually sharding model, data, and
tioning, seamless to users. queries across multiple individual non-connected
graph databases - slow, with expense and risk of
manual handling.
Deep-Link Analytics Complex 5 to 10+ hop queries on all sizes of data- Unable to support six or more hop queries on even
sets - from small to ultra-large, distributed graphs. moderate size dataset. Workaround is to export
Runs in-database graph analytics. data to Spark for external processing, which is an
extra infrastructure cost.
Graph Query GSQL - Turing-complete, can express complex Cypher - for basic queries, including pattern match-
Language graph computations and analytics natively, for ad ing. Neo4j is an active contributor to the upcoming
hoc queries and complex, parameterized proce- GQL standard.
dures. TigerGraph is an active contributor to the
upcoming GQL standard.
Graph Algorithm Open source, user extensible and customizable. Pre-compiled JAVA API calls, no ability to modify
Library Runs within the database. parameters or logic.
Visual Interface GraphStudio for full workflow: visual modeling, Bloom for graph exploration only. Available at an
ETL, exploration, and query development. additional cost.
AdminPortal for monitoring and management.
Both included.
Standard APIs Industry standards: REST APIs, JSON output, Numerous industry standards + Proprietary Bolt
JDBC, Python, Spark. API.
Cloud Offering - Free tier for lifetime for non-commercial usage. No free tier on Neo4j cloud. No starter kits avail-
Graph Database as Contains 18+ starter kits for popular use cases. able.
a Service
Total Cost of Best-in-class due to storage and computational Storage alone is five times larger.
Ownership efficiency, yielding the smallest hardware footprint. Trying to match scale and performance by using
Hardware costs for TigerGraph are typically 50% or more, faster machines: 10 times to impossible.
less when compared to Neo4j. Need to use Spark for OLAP queries which is an
added cost (infrastructure and potentially license).
Here are a few examples of customers who have upgraded to TigerGraph due to higher performance and scalability,
more functionality and lower total cost of ownership (TCO). Team TigerGraph is happy to connect up the graph
database buyers with these and other customers who can share additional details.
CUSTOMER “We quickly ran into problems scaling with our original graph database –
Large pharmaceutical loading the data took a lot of time and once it was loaded calculations [Re-
manufacturer ferral Networks and PageRank] either didn’t finish or was extremely slow.”
- Data Sciences Leader
USE CASE
Influencer/Hub Detection
(PageRank)
CUSTOMER Compared TigerGraph to other graph databases using a sample set of 17 million nodes
Database of corporate information and 10 million edges on a single machine. TigerGraph offered superior support for the
(Corporate Enterprise Knowledge following must-have query requirements:
Graph)
Degrees of separation: Support for queries of up to five degrees of separation between
entities with real-time response times - a capability that was becoming increasingly
difficult.
USE CASE
Knowledge Graph Siblings: Support for sibling queries with real-time response times, to help answer ques-
tions like, “What else does the parent of a given company own?”
Up the chain only: Enables users to see what entities exist up the chain only for any
given company, with real-time response times.
Temporal graph search: Users can ascertain if a relationship existed for a particular
time frame. They can search what entities have been created from a particular date, and
remove all old relationships from their query - not possible with Neo4j.
Active vs. dead relationships: Supports queries on a given network to see what relation-
ships are active vs. dead, so that each one can be filtered out of the query accordingly, a
feature that wasn’t possible with Neo4j.
CUSTOMER Prior to selecting TigerGraph, the customer conducted its own in-house benchmarks
Innovative Media Company based based on its requirements and thoroughly compared all available systems. With the
in Germany shortlist decided, the customer then built prototypes and performed more detailed
performance tests.
USE CASE
Recommendation Engine, Custom- “TigerGraph provides a scalable and high-performance graph database
er 360 platform,” says the customer. The integration has proven straightforward
and the flexibility of the GSQL environment makes it much easier for de-
velopers who are not yet Graph specialists to quickly get involved in our
production processes.” - CEO
Schema Sharding One schema. User must manually shard the schema into different
sub-schemas for each machine/database.
Data Loading and One loading job and automatic partitioning. User must manually partition data and load separately to
Sharding each machine/database.
Querying Query as a single database. User must design multi-stage queries to manually query
each machine/database and then stitch results together.
1.4 Functionality
This section compares the key functionality offered by TigerGraph and Neo4j.
OLAP: Deep-Link Handles deep-link (3 to 10+ hops) on ultra-large, Tops out at 2 to 5 hops on medium to
Analytics distributed graphs. large graphs. Workaround is to export
Runs large graphs in-database. data to Spark for external processing,
which is an extra infrastructure cost.
Graph Query GSQL - Turing-complete, can express complex graph compu- Cypher - for basic queries, including
Language tations and analytics natively for ad hoc queries and complex, pattern matching. Neo4j is an active con-
parameterized procedures. Excels at analytics due to built-in tributor to the upcoming GQL standard.
parallelism and innovative accumulators. TigerGraph is an
active contributor to the upcoming GQL standard.
Transactions and ACID across cluster. ACID only at single-machine level. Even-
Cluster Strong consistency. tual or causal consistency.
Consistency
Graph Algorithm Open source, user extensible and customizable. Runs within Pre-compiled JAVA API calls and no
Library the database. ability to modify parameters or logic.
Visual Interface GraphStudio for full workflow: Bloom for graph exploration only. Avail-
visual modeling, ETL, exploration, and query development. able at an additional cost.
AdminPortal for monitoring and management. Both included.
Standard APIs Industry standards: REST APIs, JSON output, JDBC, Python, Numerous industry standards + Propri-
Spark. etary Bolt API.
Cloud Service The only distributed graph database as a service. HA replica- Single instances only.
tion too. Free tier for lifetime for non-commercial usage. No free tier.
Over 18 starter kits including popular use cases including No starter kits.
Customer 360, Entity Resolution, Fraud Detection, Knowledge
Graph, etc.
y Storage efficiency: TigerGraph stores your data more efficiently than any other graph database on the
market, typically 4 to 5 times more compactly than Neo4j. That means TigerGraph can use fewer machines
than other distributed databases.
y Compute efficiency: Independent testing using the LDBC benchmark test showed TigerGraph to be 10x
to more than 1000x faster than Neo4j. Our faster execution helps in maintaining the higher QPS (Query
per Second) rate over the longer period of time. This capability reduces the need for data replication for
higher throughput purposes. Using more expensive machines and running machines in parallel for more
throughput can partially compensate for lower core performance.
y Operational efficiency: On TCO matrix it is a no brainer that fewer the servers, lesser is the direct cost of
operations, administrations, tech-support and training; The performant clusters leads to lesser in-direct
cost of security, configurability, upgrades, data storage, backups, and so on.
Unlike TigerGraph, which compresses raw data when loaded into a graph, Neo4j typically expands it. The following
table shows the loaded data storage size for TigerGraph and Neo4j:
graph500 967 MB 482 MB (50% of raw data) 2300 MB (237% of raw data)
Source: Benchmarking Graph Analytic Systems: TigerGraph, Neo4j, Neptune, JanusGraph and ArangoDB
This leads to significant differences in the cost of processing the data with TigerGraph vs.Neo4j. Sample calculation
below assumes that all of the data is loaded into RAM for analysis.
The following figure shows the annual computing or hardware costs for TigerGraph vs Neo4j based on the memory
or RAM requirements for each graph database:
Raw Graph size in Tiger- Annual cost for comput- Graph size in Annual cost for com- Percentage cost
Data Graph (GB) = raw ing for TigerGraph Neo4j (GB) = puting for Neo4j (6,000 savings with
(GB) data x 50% (6,000 dollars per 64GB raw data x 237% dollars per 64GB RAM) TigerGraph
RAM)
Moreover, the speed that TigerGraph analyzes the data on these servers will be many times faster than the speed
that Neo4j analyzes it - TigerGraph traverses 10M+ entities and relationships per second per machine, while Neo4j
has been shown to be 10 times to 1000 times slower in independent tests. This translates into more queries per
second or QPS with TIgerGraph when compared to Neo4j, allowing customers to scale up the deployment with more
users or systems accessing the insights from the graph database.