Demystifying Semantic Layers For Self-Service Analytics
Demystifying Semantic Layers For Self-Service Analytics
Analytics
Published 7 September 2021 - ID G00749457 - 61 min read
By Analyst(s): Joseph Antelmi
Initiatives: Analytics and Artificial Intelligence for Technical Professionals
Overview
Key Findings
■ Semantic layers can support different categories of analytics roles, including
consumers, explorers, innovators and experts by providing a business-friendly set of
logical data models, measures and metrics.
■ Technical professionals struggle to select the optimal location and technology for
semantic layers. The many options include various generations of analytics and
business intelligence (A&BI) tools, data marts, data warehouses, query accelerators,
knowledge graph/data fabric and stand-alone virtualization platforms.
■ Situating semantic layers in the logical data warehouse or data fabric offers
efficiencies for maintenance, consistency and scalability. The alternative approach
of coupling semantic layers tightly with A&BI platforms offers greater ease of use,
business-friendly languages for construct development, but also more lock-in of data
and less flexibility and reusability.
■ The universal semantic layer is still elusive, due to data integration, tool
interoperability, usability and governance challenges. Successful semantic layer
development requires organizations to improve their data engineering, data
modeling and data governance maturity.
■ Select an appropriate semantic layer for your use case by comparing the various
technical options in terms of source and target support, model development and
sharing, support for business constructs, calculations and functions, query
performance, ease of use, security, deployment complexity and licensing.
■ Deploy a combination of local and global semantic layer data models based on the
use cases, users and desired governance model.
Comparison
The Need for a Semantic Layer
Many organizations are on a journey to enable self-service analytics. They want to deliver
self-service solutions that empower all of their users to use data and analytics for
organizational benefit.
Analytics technical professionals are challenged to deliver data and analytics solutions
that align to the following principles:
However, delivering on these principles is not always easy. The big problem with self-
service analytics is that most of these principles are in fact, in opposition. Organizations
struggle to balance the principles that align to agility (easy to learn, available) with those
that align to control (safe, SLA- and cost-optimized) in order to deliver valuable outcomes
to a diverse set of users.
■ They were difficult for end users to use, customize and update, which meant that the
semantic layer was typically a platform only used by IT. This dependency on IT
slowed down both deployments and usage of the semantic layer platform.
Self-service has become popular partly because of the challenge of gathering good
requirements. Many business units historically have had trouble describing what their
data and analytics needs are beyond the most basic requirements, so the solutions that
are built are not that valuable. Self-service approaches, where business users build
analytics models and dashboards and objects over time as they discover its usefulness,
continue to be relevant.
The future must integrate the best of both approaches. Centralization for reuse and
governance, self-service agility for prototyping and business value delivery. Data and
analytics technical professionals must evolve and rise to that challenge. In this context, it
is worth revisiting the semantic layer.
These semantic layers often contain data in the form of measures, the numbers or values
that can be summed and/or averaged, such as sales, distances, duration and weight.
They can also contain dimensions, which are the categorical buckets that can be used to
segment, filter or group — such as sales rep, city and product. In addition, constructs on
top of these can include metrics and KPIs, quantifiable measures that are used to track
and assess the status of a specific process.
■ Provides the opportunity to rename data elements so that they make sense to
business users.
■ Offers the ability to apply rules and access privileges to KPIs and datasets. The
semantic layer is a pinch point for role-based access control and auditing.
The semantic layer will have to evolve. Semantic layers contain very valuable, business-
user-facing data. Thus, where this data is stored and calculated and made available is
quite important, as it will have implications for the success of self-service analytics
initiatives, AI/ML initiatives, and data consistency and quality across the organization.
■ Offering a knowledge graph capability which can link related data, concepts and
definitions together more effectively.
■ Offering a virtual abstraction tier, so that the semantic layer is no longer a physical
store.
■ An A&BI tool’s semantic layer, either via a traditional physically stored OLAP
approach or a more modern virtual semantic tiers.
■ A stand-alone data virtualization platform, via virtual data models and business
logic.
This multitude of options leads many Gartner customers to ask a crucial question:
Which of these analytics and data management platforms is the best to use for a
semantic layer for my use case?
The criteria that you can use to evaluate semantic layers are:
■ Connectivity
■ Query performance
This comparison is a helpful start, and to extend this comparison toward action, the
following list of criteria aligned to each of the categories may be useful to identify which
capabilities you may need.
Connectivity
■ Supports tools that speak MDX or DAX and live Excel connections
■ Supports other tools in the enterprise data ecosystem, such as third-party data
cataloging platforms and data quality platforms
For user persona support, rather than a set of criteria, it is useful to contextualize what
user expectations are when they use a specific analytic platform. See an example in Table
2 of user roles and their expectations using semantic layer platform.
However, these criteria, although useful, still paint an incomplete picture of semantic layer
choices in the modern organization. Vast differences exist in semantic layer support for
different target applications, different cloud vendor offerings and deployment models,
different data types, and different functions. As a result, more analysis is needed to
determine exactly where the semantic layer makes sense, and with which technology
approaches and vendors.
Analysis
The traditional semantic layer, linked to traditional A&BI tools, works as a data mart,
offering a layer of logic in conjunction with a store of analytics-ready data with the
context to support self-service by unskilled users. However, data collection can only scale
so far. As a result, new approaches that focus on connecting to data have become
popular.
Figure 1 describes the following four possible placement locations for semantic layers:
Option 1: Autogenerated semantic layer based on source data. This scenario describes a
semantic layer that is autogenerated during the Acquire stage, the moment data arrives
from various sources. Some vendors, such as Oracle, SAP, IBM and Microsoft, have built
analytic platforms on top of their ERP and CRM platforms to enable reporting off of data
in those systems and to detect data and metadata that can feed into function in the
semantic layer. However, this solution rarely is complete for organizations, who also need
data from other places. It is currently unrealistic to create a perfect, instantly generated
semantic layer that requires no customization or modification, especially given that most
of these vendor offerings are not being supported as the future of analytics. Too many
organizations are moving to a diverse multivendor environment to make this a realistic
modern solution at the ingestion layer, as this would require standardization by a single
vendor on a single set of limited analytics capabilities.
Option 2: Semantics at LDW. This scenario describes a semantic layer in the Organize
stage (the LDW). The logical data warehouse is designed to satisfy 95% of analytics
requirements. LDWs support a broad set of analytic engines that can support a wide
variety of users and applications. For that reason, placing the semantic layer in the LDW is
often optimal. The LDW itself, however, is composed of multiple component parts, making
a single universal semantic layer on top of all of them unlikely to be possible. Moreover,
although the LDW maximizes flexibility, building it up to that point takes time and effort,
and analytics platforms are often easier to develop prototypes on.
Semantic layer technology options: Graph database/data fabric, data virtualization, data
warehouse, data lake enablement platform.
Option 3: A&BI semantic layer. This scenario describes a semantic layer with a local,
optimized data store in the Analyze stage. This is an approach that generally requires
collecting data, so that a high-performance, in-memory mart can optimize user access to
data, user data preparation or augmented analytics. This can be a very performant
solution for specific departments, or for standardized enterprise reporting. Because of its
location inside an A&BI application, it is likely to be a siloed solution, slow to change and
hard to standardize across A&BI platforms. Moreover, performance limitations may be
encountered rather quickly for high volumes of data. A middle ground between placing the
semantic layer in the LDW and in the A&BI tier may be using the SQL Interface, or as this
research describes it, the data lake enablement platform, to put the semantic layer at the
edge of the LDW. This gets around some of the silo and performance issues since SQL
interfaces for data lake enablement often offer many query optimization functions.
2. The questions around semantic layers cloud data warehouse and lake architecture.
Cloud vendors are engaged in a campaign to dominate future data architecture
investments, and they are making significant progress with this goal. However, these
cloud data architectures are being built with semantic layer functions as an
afterthought, which is leading many organizations to consider a couple of options.
Building views in the data warehouse and business logic in a programming
language like Python to replicate a semantic layer experience in the DW. Building
semantic layer information into the data lake, especially using data lake enablement
tools like Dremio or AtScale, which also offer support for business semantic layer
calculations. DW-based approaches tend to be a lot of work to build and maintain
views and stored procedures. Data lake based approaches tend to both be a lot of
work and a lot of expense, as enablement tools require significant investment in
both time and money to deploy properly.
Why Technical Professionals Choose to Build Semantic Layers Using These Technologies
■ OLAP cubes are built in Hyperion that are provided to finance departments for
business-unit-critical, self-service analysis.
Semantic layer platforms of this generation continue to have staying power. They
function well as information portals, and because of the maturity of these platforms and
the lack of innovation in the space, they continue to offer value to organizations who need
pixel-perfect reporting options. These systems often have advantages in terms of
management and publication flexibility that only become apparent when migrating off of
these platforms. For example, there continue to be differences between tabular and
multidimensional analysis services models, which can present challenges for
organizations who are looking to evolve their SSAS multidimensional estate (see
Comparing Tabular and Multidimensional Solutions, Microsoft).
■ They are difficult for end users to learn — semantic layers built into traditional A&BI
platform were often complicated to use, leading to a scenario where not many users
in the organization were able to take advantage of these self-service capabilities.
■ They are difficult to set up and maintain — technical professionals must be experts
to set up and maintain a semantic layer connected to a traditional A&BI tool.
■ They are IT managed, offering few opportunities for collaboration with the business
— because everything is centrally managed, users don’t have much of an opportunity
to contribute to improving the platform. As a result, semantic inconsistencies can
become frighteningly common.
As a result, semantic layer-based traditional BI tools were becoming big, static, slow-to-
change monoliths. For example, business units wanted to experiment and innovate, but
the change process for semantic layers required a change request to go to IT, and for IT to
make that change.
Technical professionals should closely evaluate the use cases for these technologies in
their organization, and in particular, the vendor’s dedication to platform investment and
innovation. Many of these A&BI platforms are not receiving a large amount of innovation
compared to the SaaS, self-service BI platforms that megavendors are building to replace
their traditional A&BI offerings.
Primary benefits: Robust, mature semantic layer platforms that enable ad hoc exploration
by trained business users. These products, like OBIEE, IBM Cognos Framework Manager
Model, and Online Analytical Processing (OLAP) Platforms such as SAP BO Universe,
Microsoft Analysis Services and Oracle Hyperion, have been satisfying enterprise
analytics use cases for a long time and, for many organizations, they continue to do so.
Primary issues: These platforms are part of a mature market, and for the most part, the
rate of innovation has slowed significantly. In addition, the proprietary data architecture
that these tools use mean that they do not integrate well with third-party A&BI or LDW
architectures without significant customization, and self-service capabilities are
circumscribed.
Example vendors: Infor Birst, IBM, Microsoft, MicroStrategy, Oracle, Salesforce, SAP,
Sisense, Tableau, ThoughtSpot, Qlik and Yellowfin
Modern analytics and BI platforms, which integrate augmented analytics vendors, offer
compelling capabilities. They offer a modern version of the A&BI semantic layer: Load
your data into our platform, and bring your logical data model, measures and metrics into
our platform, and in return, we will provide you with flexible prototyping and promotion
capabilities, as well as augmented analytics capabilities. The augmented analytics vendor
will allow you to query data with natural language and take advantage of automatically
generated insights. It will also allow you to have powerful modeling capabilities inside the
platform.
In some organizations, there is a strong desire for a shortcut to highly accessible, usable,
self-service analytics that don’t require a large investment in multiple data platforms.
Often, these organizations are laser-focused on delivering the most dynamic, AI-
augmented capabilities for A&BI. These might include the capability to do the following:
However, the complexity of bringing AI and/or ML into production includes the challenge
of optimizing compute, memory and storage resources for these deployments, as well as
the challenge of getting the sufficient volume of trusted, fit-for-purpose data into these
systems. To deliver on the augmented analytics promise, many vendors pursue a strategy
where the data is brought into their proprietary data architecture first, then build semantic
relationships between data fields, exposing rich AI-driven augmented analytics.
The fact that the vendor provides some automodeling and data-refinement capabilities,
coupled with a networked semantic layer, means that organizations can skip a lot of the
slow manual steps of modeling their data and gain efficiencies, moving their analytics
into production. However, it is important to note that vendor platforms may not always
provide accurate or current automated features. User feedback is a necessary input to
identify whether the AI/ML enabled functions of these tools have functioned.
Modern analytics and BI also enables self-service by providing a flexible semantic layer
for analytics inside the BI platform. This allows a central data model to be created and
maintained, while enabling users to easily prototype new dashboards and mashup new
data sources, and then easily promote these datasets back into the central data model.
One problem with building a semantic layer inside an augmented analytics A&BI platform
is that it is typically proprietary, and not very integratable with other BI tools or other data
sources. So the solution may work quite well, until there is a business requirement to use a
different A&BI tool or move the data to a different location. In those scenarios, the
organization is typically looking at a time-consuming migration. If this is built inside a
cloud vendors’ platform, there may also be egress charges for the data that moves out of
the particular cloud vendor.
Evaluating these modern analytics initiatives from a project life cycle lens, that
incorporates return on investment, can help organizations to identify when the shiny new
tool is at risk of quickly becoming technical debt, and whether a more efficient approach
is available. Moreover, technical professionals should consider viewing the semantic layer
that they build inside these A&BI platforms as a local semantic layer, not a long-term
global semantic layer.
Technical professionals should also look at Query-Focused A&BI Tools, such as Google
(Looker), AnswerRocket, and Sigma Computing, if they are making a significant
investment in data warehousing. Query-focused A&BI tools can be a great option for
adding a query and modeling layer to enable citizen data scientists and technical power
users to access data, especially data sitting across several cloud and on prem data stores,
and to create new datasets and visualizations on top of them. Their strength is that they
handle everything via query connections, which can deliver significant flexibility to skilled
users.
Primary issues: Proprietary data architectures do not integrate well with third-party A&BI
or LDW architectures. For example, natural language search capabilities usually require
data to be ingested and prepared inside the A&BI platform before they are used. These
features are still maturing, and often lack enterprise-grade administration and deployment
features. Infrastructure complexity and data integration complexity is a common
challenge. For more information, see Using Augmented Analytics to Boost A&BI.
Data management is a critical component in the modern digital enterprise, and data
warehousing continues to remain the most pragmatic way to process large and complex
datasets for timely and trusted insights. Thus, data warehousing continues to be an
important component of the logical data warehouse.
Increasingly, cloud data warehouses offer a wider variety of capabilities and deployment
options to support modern data storage and processing requirements. These platforms
are often offering more competitive licensing and the same support for low latency data
access with high concurrency. Cloud offerings are being built with embedded AI/ML to
manage the warehouse more efficiently, improved ease of use, more on-demand pay-per-
usage pricing models, and to deliver faster throughput, performance and scale to the
growing demand of your organization.
Why Technical Professionals Choose to Build Semantic Layers Using These Technologies
Although data warehouses are unlikely to be retired in an organization, the use of a data
warehouse as a semantic layer has some limitations that apply especially to older, legacy
warehouses that don’t use LDW, MPP technologies, or modern techniques such as
collaborative workspaces. Thus, these limitations are not true of all data warehouses, but
apply commonly enough that the point is worth making:
■ Building many, many views doesn’t scale, and can involve a lot of development work
by a relatively small team.
■ Data warehouses are not end-user-friendly platforms, and thus the end user does not
have the opportunity to be involved in the process unless the DW team brings them
in and helps them get involved.
■ Direct query against models that may sit in a data warehouse can be quite slow, if
the queries have not been optimized according to the requirements of both the A&BI
tool and the data source. In scenarios where the A&BI tool and the data warehouse
are in different clouds, laws of physics still apply, and only limited query
optimization is usually possible.
■ The calculations and business logic required for a semantic layer to work are not
always natively available in the data warehouse, necessitating some additional
development of this logic, often in a different programming language like Python.
Technical professionals should follow LDW best practices and seek to deploy DW
strengths to use cases that require low-latency high concurrency data access. Moreover,
they should consider DW and A&BI tool integrations before they make data warehouse
and A&BI tool decisions.
Primary benefit: Building views on top of data warehouses or virtual mart environments
inside data warehouses, is a great (and affordable) way to deliver trusted data for
compromise analytics models. Virtual marts can also be useful for candidate and
contender approaches. The number of concurrent analytics users is relatively modest and
the number of marts to be managed can serve as a great, central way of managing a
semantic layer.
Primary issues: Platform efficiency tends to be quite high, but these are not typically very
end-user-friendly platforms. As a result, environment development hinges on skilled data
engineer roles. They are not usually the target of initiatives to broaden self-service,
although their rules can significantly benefit from the capabilities of the DW automation
space.
Organizations that know they want to collect diverse forms of data that reside in multiple
formats and locations across the enterprise, and then analyze this data for discovery
analytics use cases. They are realizing that traditional data warehousing technologies
can’t meet all their new business needs, causing many to invest in a scale-out data lake
architectural pattern.
The point of a data lake is that its simplicity enables broad, flexible and unbiased data
exploration and discovery. This is important because advanced forms of analytics (such
as data mining, statistics and machine learning) usually involve data exploration and
discovery. Unbiased exploration and discovery is, in fact, impossible when the bias of
optimization has been placed on that data (as is done in the data warehouse).
Unlike a data warehouse, a data lake preserves the original details of source data for the
richest data exploration, discoveries and analytic correlations possible.
Building a semantic layer on top of a data lake removes some of the classic problems of
lakes: understandability, performance and SQL access, and in doing so makes a data lake
more like a lake house that is ready to do any analytics more easily and reliably.
Technical professionals who are thoughtful about data lake design often make some key
observations. The first is that the data lake should complement, not replace the data
warehouse. For example, most data warehouses were designed for reporting and older
forms of analytics. Most data lakes are built to enable new and advanced forms of
analytics, which are essential to digitalization and innovation. Business reporting
demands deeply optimized data with a detailed audit trail, whereas advanced analytics
and data science demand massive volumes of detailed source data for extreme data
exploration and discovery analytics.
This is why data stored in a lake differs sharply from data stored in a warehouse,
although the data for the two can come from the same sources. Hence, an architecture
that integrates a data lake and a data warehouse is capable of supporting a broad range
of business use cases and data management strategies. However, advanced data science
users shouldn’t have to spend their time on data engineering in order to create advanced
analytics off of data in these sources. Moreover, citizen data science users also need
access to data that may be hosted in a data lake.
This makes data lake enablement investments essential to deliver SQL query support on
top of data lake data. Some technical professionals also adopt data lake technologies
because cloud vendors tell them to do so, as the cloud vendors strategy involves data
ingestion into their low-cost data lake, then the provision of data and analytic services on
top of that now cloud-native data.
Data lakes can be very challenging to implement. Often, the data lake platform has not
structured data sufficiently to make it understandable and useful. Incomplete lake
governance creates a data swamp, full of m large amounts of raw or uncurated data. The
reality is that only a handful of staff are skilled enough to cope with such data, and they
are likely doing so already.
The third assumption is that data lake implementation technologies perform far better
than they actually do, which leads to wild overestimations of their benefits.
The biggest drawback to data lakes is the lack of context, which the semantic layer and
data catalog try to provide. The challenge though is that given the necessity of semantic
layers as a way to provide context and consumability to data in the data lake, semantic
layers are rarely being implemented to cover every potential analytics use case. The
semantic layer must first be built to make the data lake usable. In the semantic layer
context, as business friendliness, data governance and performance are prerequisites to
success, this means that data lakes can be quite challenging to deploy for this use case
without significant investments in the platform, and supporting a highly ambitious and
broad semantic layer on top of data in the data lake isn’t a very common scenario among
customers.
The key take-away for technical professionals from this section is to recognize the role
that data lakes play in the data analytics architecture as a data store that is optimized for
novel and discovery analytic use cases, as well as a data store that is affordable.
Technical professionals should exercise caution attempting to replace every analytics use
case with the combination of a data lake and a data lake enablement tool. Building Data
Lakes Successfully — Part 1 — Architecture, Ingestion, Storage and Processing, and
Building Data Lakes Successfully — Part 2 — Consumption, Governance and
Operationalization, should be part of your reading list.
Primary issues: Data lakes require significant work to get them ready for production. This
includes governance, data profiling, data tagging, data security, metadata management,
data wrangling and data integration. As a result, their viability as a semantic layer is often
a long-term goal, not a short-term reality. Only some cloud vendors have taken the
approach of enabling a data lake as a semantic layer (in conjunction with other cloud
services) that is ready to interact with BI.
Data Virtualization
Example Vendors: Denodo, Dremio, Intenda (Fraxes), Data Virtuality, IBM, Informatica,
Oracle, SAP and TIBCO Software
DV, either offered stand-alone or via a data integration platform, has been of interest to
many organizations because of its ability to provide a semantic layer tied to a layer of
access, federation and/or virtualization. With data virtualization, combining different
sources of data to execute federated queries is possible without replicating it into
persistent storage — also without the creation of a physical cube or star schemas. This
leads to increased agility.
Why Technical Professionals Choose to Build Semantic Layers Using These Technologies
Data virtualization offers the ability to create a virtual model of data that joins relational
and nonrelational data, from many sources, on-premises and/or in the cloud. It simplifies
data access for analytics, improves reuse, reduces change impact and helps to achieve a
consistent, reusable semantic layer.
The reality is although DV can support many use cases, it is not ideal for every use case.
Semantic layers typically require blazingly fast performance that can be difficult to deliver
without serious caching. Semantic layers also require significant target application
support that not every DV vendor has built.
DV has a lot of potential, especially for agile analytics prototyping as well as unifying
disparate data sources, but DV tools have not yet become platforms, for the most part, for
the entire organization to use. Some workflows inside the DV platform may be business-
user-friendly, like searching a catalog, but not all of them.
Thus, building a semantic layer on top of DV technology will still be a process that
technical professionals must manage, and they must adapt their approach to build virtual
views instead of ETL and physical consolidation. DV also provides benefits for
performance optimization via caching, which can make semantic layers more performant
than it may at first appear, as the cons of additional layers of abstraction are balanced out
by the pros of additional query optimization.
Primary benefits: DV unifies views of data, and consolidates business rules and logic. DV
provides one logical location to access all data that has been connected to using DV
technology. This greatly simplifies data management for both IT and analytics platform
users, especially when this use case is extended to support a semantic layer that includes
business definitions, measures, metrics, rules and logic.
DV can federate queries among varied sources, and can help organizations handle
multicloud architectures without expensive copying and moving of data. This on-the-fly
ability does not require ETL planning or procurement of physical hardware for a data
mart. Rather, DV can federate a query even among disparate sources in terms of type,
location and size. All that is required is a common key — even if the keys are represented
as synonyms.
Primary issues: For DV to work, it must use defined datasets. DV will suffer or fail if the
datasets connected are not properly defined and understood. The data must be
represented in a defined, standardized form or be able to be put into a compatible and
well-understood form. Combining such data cannot be accomplished without
standardizing the data first. Standardization may require data transformation.
A data fabric maps data residing in disparate applications (within the underlying data
stores, regardless of the original deployment designs and locations) and makes them
ready for business exploration. Connected data enables dynamic experiences from
existing and newly available data points leading to timely insights and decisions. This is
very different from a static experience with reports/dashboards. There are multiple
technology components, but one of the most relevant to the semantic layer topic is the
idea of a knowledge graph. In theory, semantic layers are a great use case for a
knowledge graph and data fabric, as it fits several of the criteria of a graph use case.
Why Technical Professionals Choose to Build Semantic Layers Using These Technologies
The semantic layer is a place where users may need to define new relationships, new
calculations, new connections, new constructs of data, and traditional relational models
can make this more difficult. Data fabrics built on graph data stores have potential when
there are a lot of relationships in the data and there is a need to understand the obvious,
hidden and latent relationships in data. Moreover, they excel when there is a need to
rapidly traverse a large number of relationships to unearth insights and patterns and
complex rules that need to be computed quickly.
Semantic layers should be able to support analytics calculations and requirements even
as they become more sophisticated and demanding, and graph analytics can uniquely
support functions like link analysis, and also provide insights into how semantic model
link to different types of users and use cases in a more flexible way than traditional
relationally modeled data.
Deploy graph data stores and data fabrics for use cases with undeniable and
transformative business benefits. For example, fraud analytics, or life sciences data
fabrics. Piggyback on these proven use cases to also offer a graph-based semantic
model, which can provide an enriched user experience.
Adopt graph when the data has high variability that does not fit well in a two-dimensional
data model of rows and columns. Adopt graph when your current system does not scale
or perform because of slow joins in Relational DBMS systems, and when the data model
is prone to change. Finally, adopt graph when there is a need to integrate disparate
heterogeneous data sources and when there is a need to link data to metadata —
versioning, provenance, lineage, data validation, data discovery and data quality checks —
and to identify anomalies.
Primary benefits: Imagine being able to visualize a cluster of every calculation and KPI
and see what the shared source tables are. Graph analytics can excel when requirements
are to visualize clustering or groupings of data based on connections or patterns, and
when there is a need to quickly compare new data to existing data for possibly merging
data
Primary issues: The biggest issue with semantic layers built on graph and data fabric
stores is that it is really hard to find any reference customers doing this currently. So
organizations adopting this may be on their own. At present, graph technologies may
provide additional context, but to provide that they need significant upskilling, effort, and
integration. That is often a reach for already busy technical professionals.
Guidance
The New Semantic Layer Is Not Going to Be Universal, or Single Engine
The purpose of the original single-technology, single-server data warehouse coupled with
the single-technology semantic layer was clear: To enable usable analysis that spanned
all of an organization and across time. This was perhaps overly ambitious: No one
technology has come to surpass all the others in the area of analytics and no one
technology is capable of handling all requirements. It is now clear that the optimal
analytical platform is an integration of multiple technologies and data stores.
The original enterprise data warehouse (EDW) and the traditional semantic layer have not
achieved their original goals because of the growth in diverse users, use cases, data types,
data velocity and data volume in the organization.
The original aims of data warehousing and an associated semantic layer remain valid.
These were to provide a broad, and historically deep, shared view of everything that is
going on within an organization. The semantic layer could take that view and deliver it to
the users in familiar terms. Data warehouses did this by integrating and keeping the data
generated by the organization’s business processes. Semantic layers did this by providing
the metadata, definitions and context so that data was easy to understand and didn’t need
much preparation or transformation.
The first is to attempt to integrate and organize a more centralized semantic layer to
satisfy broad, shared use cases. This is the essence of a global semantic layer.
The second outcome is that there must be flexible, local semantic layers to give data
innovators the space to create new datasets and analytics, and do so without forcing
them to conform to data management and quality best practices from the outset. This is
the essence of a local semantic layer.
However, semantic layers should be deployed globally, as quickly as possible after the
analytics prototype is generated, promoted and certified to be valuable. This enables the
organization to reuse this insight, and also reduce the complexity of managing diverse
data models and the data hosted in various systems in the cloud and on-premises.
Lower Your Own Expectations to Account for the Gap Between the Ideal Semantic Layer
and Reality
Recall that in self-service, organizations need to optimize against many goals. Good self-
service architectures are outcome-oriented, valuable, easy to learn, accessible, safe and
trusted. In line with this, the ideal semantic layer needs to deliver against many self-
service goals. However, the reality is often different from the ideal. You’ll be looking for the
right tool for the job, rather than a panacea, because there are many existing limitations.
■ Why semantic layers don’t live up to this ideal: Achieving good integration
between various components of a semantic layer platform is mostly still a pipe
dream, outside of a single megavendor or cloud provider D&A stack. This
means that users will not have a seamless experience to access data.
Moreover, most A&BI semantic layer platforms provide folders and hierarchies,
but don’t provide much searchability. In contrast, data integration and
cataloging products provide searchability, but limited grouping.
■ Why semantic layers don’t live up to this ideal: For the most part, A&BI tools
have not built integrations to connect to other A&BI tool semantic layers,
although some exceptions exist. Increasingly, A&BI platforms are opening
access to their own semantic layers, for use by other BI tools. For example,
Incorta and MicroStrategy have opened up access to their semantic layer so
that Microsoft Power BI and Tableau can connect to datasets inside those
platforms. Microsoft Power BI has done something similar to enable tools that
offer a data connector to analysis services to access data hosted in Power BI.
However, this type of integration is typically developed to enhance product
stickiness, with vendors encouraging customers to use the vendors platform to
build the semantic layer. Moreover, different file formats continue to be difficult
to integrate seamlessly. Tools that specialize in this type of integration, can
provide wide integration capabilities, but they often require opening up a
separate tool with a separate workflow, reducing the previous capability of ease
of access and use.
■ Why semantic layers don’t live up to this ideal: Most semantic layers don’t
intelligently detect the format of data, unless they are data preparation,
wrangling and/or integration tools, and as a result, the technical professional
will often have to manually prepare and model data inside either a data
preparation platform, the virtualization tool, the data warehouse, the data mart
or the A&BI tool. Some platforms offer some capabilities to use AI to enhance
the data preparation and data modeling experience, but they are typically built
on highly proprietary data architectures.
■ Why semantic layers don’t live up to this ideal: Every investment in ease of use
as well as development efficiency typically pushes semantic layer products
more toward the end user, toward in-memory environments, with many
optimizations to make data accessible and usable. But to scale, semantic
layers need to be built to cluster, to perform and to support many concurrent
users. Data lake capabilities can help with scale, and data warehouse and data
mart capabilities can help with performance, but developing semantic layers on
top of data lakes and data warehouses can be highly labor-intensive and
involve many other tools. Graph databases potentially offer both easy-to-model
data as well as high performance, but they are still maturing, and have not
been deployed for this use case by many organizations outside of the data lake
space.
■ Why semantic layers don’t live up to this ideal: Many Gartner clients have
questions about why their preferred A&BI vendor has not built native single
sign-on (SSO) connectivity to their preferred data source. Often, the complexity
relates to the preferred application or database residing on a different cloud
platform or as a SaaS offering. In reality, building these connections takes
significant work, and vendors will only build the data security capabilities that
customers are consistently clamoring for in large numbers. Often,
megavendors also have a desire, not just to sell the A&BI platform, but to have
that A&BI platform drive adoption of their cloud platform. This will
disincentivize their efforts to develop deep integrations with data hosted with
different vendors, cloud providers or platforms that don’t serve their strategic
vendor goals.
■ Data governance: Data that flows into a semantic layer needs to be tracked, in
particular for changes. Having lineage as to who is accessing data, when and how
they are using it, as well as lineage as to where data came from and whether
changes have taken place, are important. Data governance also applies to the
overall semantic layer platform, whether it can be controlled, while giving users some
freedom to make changes, and having requirements in some cases to go through a
promotion and certification process.
■ Why semantic layers don’t live up to this ideal: In general, the lack of integration
between semantic layer platforms and their target A&BI capabilities inhibits
sophisticated data governance across platforms. Moreover, in many cases,
functions are built, but because they require heavy-handed IT control, they don’t
necessarily engender trust by business users. Data governance capabilities
tend to conflict with ease of use and development-efficiency-focused
capabilities, leading to platforms that must uneasily balance these conflicting
goals.
© 2021 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of
Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form
without Gartner's prior written permission. It consists of the opinions of Gartner's research
organization, which should not be construed as statements of fact. While the information contained in
this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties
as to the accuracy, completeness or adequacy of such information. Although Gartner research may
address legal and financial issues, Gartner does not provide legal or investment advice and its research
should not be construed or used as such. Your access and use of this publication are governed by
Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its
research is produced independently by its research organization without input or influence from any
third party. For further information, see "Guiding Principles on Independence and Objectivity."
Data
Data Data Lake
A&BI Semantic Data Warehouse Fabric/Graph
Virtualization Enablement/Semantic
Layer Platform Data Store
Platform Layer Platform
Platform
Sample Vendors Google (Looker) AtScale, TIBCO IBM, Teradata, Amazon Cambridge Semantics, AtScale, Kyligence,
Microsoft, SAP, Sisense, Software, Data Web Services, Microsoft CluedIn, Denodo, Kyvos Insights, Dremio,
IBM, MicroStrategy, Virtuality, Denodo, Azure, Google, Oracle, Informatica, Semantic Apache Druid,
Oracle. Zetaris, SAP, Snowflake. Web Company- Clickhouse, Apache
Microsoft, Amazon Web data.world, Stardog, Pinot.
Services (AWS). Talend.
Connectivity, Source and Strong source support; Strong source support Medium source support; Medium to strong Source support is strong
Target Support weak target support. and middling target they tend to serve as a source support, if the tool uses
support, as data source, and external depending on whether virtualization, more
virtualization platforms source capabilities data must be loaded limited if built to serve as
have been traditionally provided via query into the platform. Weak a query acceleration
strong for data federation/virtualization target support, as few layer or data mart;
integration use cases , just not many target applications have target support medium
and are improving for transformations built strong connectors given the small size of
semantic layer use supported, and weak to with graph stores. many of these vendors
cases. medium source SLA and the lack of rich
support; strongest target support that has
target support, given the been built.
long popularity of
connecting tools to DW
platforms.
Model Development and Legacy: sophisticated Simple, but less Robust, sophisticated Model development is in Modern: drag and drop
Sharing models, complex and sophisticated models. model development some ways simpler, as features, sharing
daunting interface, Easy model sharing and features, sharing network models in graph features. Legacy, a bit
often sprawling. collaborative features vary. theory are “whiteboard more dated model
Modern, simpler development features. ready”; however, design.
interface, less developing models can
sophisticated models be complex as business-
unless you are good at user capabilities are still
adding custom logic. maturing.
Business Constructs, Legacy: Strong Medium support for Strong support for Weak support for typical Construct and
Calculations, and construct and constructs, functions, constructs and business constructs and calculation support
Function Support calculation support and calculations, not calculations, but not calculations natively, as depends on how good a
potential once you superfriendly for superfriendly for a large amount of developer you are; often
master the tool. business users. business users translations/adaptation fewer of them supported
Modern: fewer into the graph query than competitor
constructs and language is required. offerings.
calculations natively However, graphs enable
supported. other unique
calculations such as link
analysis.
Query Performance Legacy: Fewer query Lots of tools for Natively powerful query If optimized correctly, Query optimization
optimization tricks, but optimizing query performance; however, blazingly fast query tends to focus on
good query support for performance caching, sometimes challenging performance is possible whatever the strengths
data in preferred BI pushdown processing, to take advantage of for semantic layer use of the tool are, if a query
server. Modern: same. query rewrite and this performance if not cases. However, correct acceleration platform,
substitution, MPP using a target that can optimization takes a using the in memory
support. make full use of significant skill set. layer, if a DV tool, similar
underlying platform to the DV column.
power and optimizer.
User Persona Support Development: Legacy: BI Development: Data Development: DBAs, a Development: Typically Development: Some
developers. engineers, data few report developers. the biggest challenge is power users, some
Modern: Power users, BI integration Production: Challenging finding developers who report developers. Some
developers. professionals, a few to customize for end are comfortable with tools: relatively easy to
Production: lots of power users. users, skilled technical graph languages. use for drag- and-drop
consumers, some LOB Production: lots of professionals typically Production: intuitive development of
specialists like finance. consumers, some data required. model creation; however, semantic layers. Others
scientists. big learning curve to are completely
understand how the technically challenging.
models work.
Security and Typically, A&BI- tier- Virtualized security and Strong place to define Theoretically a very Security and governance
Governance focused security model, governance model; security, governance powerful capability for features are somewhat
this requiring great potential but features are solid, but security is embedded in limited to inside the
duplication of efforts. limited by sources and the platform is not graph analytics, as platform.
targets that support business-user-friendly. triples enable interesting
security and governance attribute level security
capabilities. implications with little
overhead. In practice,
few developers have the
skills to do this.
Desired Outcome View analytic content Select from available fields in Mashup multiple certified Introduce data from
periodically; use it to make a semantic layer to seek out data sources, query against completely new data sources,
data-driven decisions. diagnostic analytics; discover large datasets, create novel create data transformation
the answers to “why” visualizations, generate new scripts, and use advanced
questions. insights on data that may analytics and ML to build
have an impact on the transformational analytics.
organization’s future.
Augmented Analytics feature Natural language text and Natural language processing Automatic data profiling and Rich transformation and
voice query, and (NLP), SQL generation, data classification join query language functions
autogenerated insights. automatic visualization recommendations, visual available, and deep
generation and jargon-free lineage and impact analysis capabilities with R, Python,
ML services that provide for data changes, and menu- PMML, and augmented ML
insights such as key drivers driven advanced analytics available to increase time to
analysis. functions. production for advanced
analytics.
Usability Features Analytic visualizations should In addition to consumer In addition to explorer In addition to explorer and
be searchable, prepopulated features, data should be features, sophisticated data innovator features, advanced
and customized to the users’ organized and linked with rich preparation capabilities with data source ingestion and/or
needs. Definitions of metrics lineage and metadata, with embedded forecasting, connectivity, configuration
and measures are easily the ability to open data in classification and clustering management and monitoring,
accessible and linked to analytic tools that offer drag- functions should be available. and interfaces to and from
dashboard objects. and-drop visualization and data science, machine
analytics capabilities. learning (DSML) and
augmented ML.
Business Workflow Affinity Aligned to information portal Aligned to analytics Aligned to data science hub, Aligned to artificial
capabilities, and embedded in workbench capabilities, with the ability to create intelligence hub, specifically,
business applications, not including the ability to explore features by enriching, joining by automating and
just inside the BI tool, with and mash up data to deliver external sources with augmenting key portions of
powerful mobile app-based new insights, thus, integrated semantic layer data. A analytic processes. A&BI
data access. Reporting into A&BI tools. feedback loop exists to allow functions are programmable,
capabilities and linkage to more sophisticated users to automatable, repeatable,
productivity applications like enrich semantic layer with reusable, and integrated into
Excel important. more metadata or update the experts’ preferred open
fields. source or packaged toolchain
for DSML and AI.
Security and Governance Guardrails have been set that In addition to consumer In addition to consumer and In addition to innovator
Features ensure consumers can’t capabilities, data catalog explorer capabilities, when capabilities, experts are
access data that they offers a view of what data creating datasets, innovators trained in data governance,
shouldn’t. exists, but actually accessing can apply row-level security, so that they can enforce
this data requires explorers to or take advantage of SSO to a governance rules that exist.
follow the request and trusted, centrally managed Additionally, they have a very
approval process. data source. granular ability to secure,
monitor, and measure data
analytics capacity, usage and
performance.
Semantic Layer Priority Ease of use, user enablement Control, governance, consistency A balance of agility and control
Semantic Layer Technology ■ Local semantic layer: Modern A&BI ■ Local semantic layer: Query- ■ Local semantic layer: Depending
Approach platforms that offer maximum focused A&BI platforms that offer on the use case, a combination of
usability and power to end users access to data warehouse data for approaches employed, including
for semantic model creation model access and temporary import mode into an A&BI tool,
tables for creation and direct query to the data
■ Global semantic layer: Data lake
customization warehouse, data preparation
enablement platforms which offer
workflows in a data virtualization
significant query and semantic ■ For global access to data: Data
or data prep tool, or data that is
layer capabilities for a broad set of warehouse and DW automation,
made accessible via a lake
users which can populate data marts
enablement platform
that are centrally created and
provisioned ■ Global semantic layer: The LDW,
i.e., a combination of data
warehouse views, data lake shared
packages of data, and data
virtualization for access to data
that isn’t in the first two
categories; for mature use cases, a
data fabric, which adds metadata-
driven recommendations and
AI/ML insights
Strengths Easiest to way to enable access to Easiest way to govern data access; Flexibility, as the advanced level LDW
new data sources cheaply organizations have built up a is in a constant, dynamic state of
semantic virtual tier and minimized change; these changes occur in
the uncontrolled proliferation of data response to changes in business
marts analytics requirements, moving D&A
workloads among the analytics
engines in the stack; LDW contains
every event, transaction, interaction,
sensor reading, customer, employee
and supplier — any and every entity
Weaknesses Data quality issues are likely to arise ■ So far, only partial success, due to The LDW model, if insufficiently
with siloed A&BI adoption, and data low adoption, dissatisfied end supported by resources for
lake initiatives often fail to reach users and high costs implementation and management,
maturity because of the difficult of combines the complexity and chaos
■ Development processes are rigid
data governance of decentralization with the cost and
and time-consuming
slowness of centralization; risk of
■ Badly designed batch data failure due to complex
movement, limits agility and implementation increases with the
delivery efficiency ambitiousness of this project
■ New types of data such as IoT,
social media, weblogs and
geospatial are not supported in
existing architecture