100% found this document useful (1 vote)
707 views

Demystifying Semantic Layers For Self-Service Analytics

Uploaded by

Victor Hu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
707 views

Demystifying Semantic Layers For Self-Service Analytics

Uploaded by

Victor Hu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Demystifying Semantic Layers for Self-Service

Analytics
Published 7 September 2021 - ID G00749457 - 61 min read
By Analyst(s): Joseph Antelmi
Initiatives: Analytics and Artificial Intelligence for Technical Professionals

Data and analytics technical professionals are struggling to deliver


self-service analytics that balance agility and control. This
document helps technical professionals compare options for
building and deploying the semantic layer, a valuable enabler of
self-service analytics.

Overview
Key Findings
■ Semantic layers can support different categories of analytics roles, including
consumers, explorers, innovators and experts by providing a business-friendly set of
logical data models, measures and metrics.

■ Technical professionals struggle to select the optimal location and technology for
semantic layers. The many options include various generations of analytics and
business intelligence (A&BI) tools, data marts, data warehouses, query accelerators,
knowledge graph/data fabric and stand-alone virtualization platforms.

■ Situating semantic layers in the logical data warehouse or data fabric offers
efficiencies for maintenance, consistency and scalability. The alternative approach
of coupling semantic layers tightly with A&BI platforms offers greater ease of use,
business-friendly languages for construct development, but also more lock-in of data
and less flexibility and reusability.

■ The universal semantic layer is still elusive, due to data integration, tool
interoperability, usability and governance challenges. Successful semantic layer
development requires organizations to improve their data engineering, data
modeling and data governance maturity.

Gartner, Inc. | G00749457 Page 1 of 36

This research note is restricted to the personal use of [email protected].


Recommendations
Data and analytics technical professionals implementing semantic layers in data and
analytics architectures:

■ Select an appropriate semantic layer for your use case by comparing the various
technical options in terms of source and target support, model development and
sharing, support for business constructs, calculations and functions, query
performance, ease of use, security, deployment complexity and licensing.

■ Combine analytic engines for competitive advantage. A&BI enables a prototype-to-


production approach to semantic layer development; these models can then be
promoted into a multiengine LDW-style architecture. This enables business
measures, metrics and data models to originate as A&BI prototypes and go into
production in the logical data warehouse.

■ Deploy a combination of local and global semantic layer data models based on the
use cases, users and desired governance model.

Comparison
The Need for a Semantic Layer
Many organizations are on a journey to enable self-service analytics. They want to deliver
self-service solutions that empower all of their users to use data and analytics for
organizational benefit.

Analytics technical professionals are challenged to deliver data and analytics solutions
that align to the following principles:

■ Outcome-oriented: Aligns to the broader organization’s goals.

■ Valuable: Provides a benefit to users (e.g., accessible to a larger population of users,


more flexible, more sophisticated analytics, more affordable).

■ Easy to learn: Intuitive to grasp, with training resources available.

■ Available/reusable: Easily accessed, embedded in applications and workflows, and


reusable by multiple systems and users. This principle typically requires a flexible
platform because the semantic layer often needs to both unify data across multiple
data management platforms such as data lakes and data warehouses, and also
integrate directly into source systems, depending on the use case.

Gartner, Inc. | G00749457 Page 2 of 36

This research note is restricted to the personal use of [email protected].


■ Safe: Governance guardrails, sophisticated identity, access, and security
management.

■ SLA- and cost-optimized: High-quality data is delivered by a performant, reliable


platform, at the best possible cost.

However, delivering on these principles is not always easy. The big problem with self-
service analytics is that most of these principles are in fact, in opposition. Organizations
struggle to balance the principles that align to agility (easy to learn, available) with those
that align to control (safe, SLA- and cost-optimized) in order to deliver valuable outcomes
to a diverse set of users.

Traditional Semantic Layer Approaches Were Insufficient


Business units are focused on delivering value in the form of fast, agile analytics in their
silos and, in the process, often undermine broader organizational goals for data
consistency and a shared set of key performance indicators (KPIs). IT departments,
occupied with furthering organizational goals for trusted, safe IT-led data management,
don’t provide business units with the data or the analytics capabilities that they need in a
timely fashion. Historically, an IT-built semantic layer was a big part of the solution to this
problem. But this IT-led, semantic layer approach fell short, for two major reasons:

■ They were difficult to set up and maintain.

■ They were difficult for end users to use, customize and update, which meant that the
semantic layer was typically a platform only used by IT. This dependency on IT
slowed down both deployments and usage of the semantic layer platform.

Self-service has become popular partly because of the challenge of gathering good
requirements. Many business units historically have had trouble describing what their
data and analytics needs are beyond the most basic requirements, so the solutions that
are built are not that valuable. Self-service approaches, where business users build
analytics models and dashboards and objects over time as they discover its usefulness,
continue to be relevant.

The future must integrate the best of both approaches. Centralization for reuse and
governance, self-service agility for prototyping and business value delivery. Data and
analytics technical professionals must evolve and rise to that challenge. In this context, it
is worth revisiting the semantic layer.

Gartner, Inc. | G00749457 Page 3 of 36

This research note is restricted to the personal use of [email protected].


Why the Semantic Layer?
A semantic layer is a business representation of data that helps end users access data
autonomously using common business terms. The semantic layer achieves this by
mapping complex data into familiar business terms such as product, customer or revenue
to offer a unified, consolidated view of data across the organization.

These semantic layers often contain data in the form of measures, the numbers or values
that can be summed and/or averaged, such as sales, distances, duration and weight.
They can also contain dimensions, which are the categorical buckets that can be used to
segment, filter or group — such as sales rep, city and product. In addition, constructs on
top of these can include metrics and KPIs, quantifiable measures that are used to track
and assess the status of a specific process.

A semantic layer has the following core functions:

■ Provides a translation of the underlying database structures into business-user-


oriented terms and constructs.

■ Presents data elements in a way that’s intuitive to businesspeople.

■ Provides the opportunity to rename data elements so that they make sense to
business users.

■ Delivers an interface to hold business descriptions of data elements.

■ Provides a mechanism to define, and store calculations and business rules.

■ Offers the ability to apply rules and access privileges to KPIs and datasets. The
semantic layer is a pinch point for role-based access control and auditing.

The semantic layer is an important element of analytics strategies because it serves as a


customizable, business-user-friendly repository for measure, metric, and KPI data.
However, it is not always the best choice feasible to define this layer inside of a traditional
A&BI tool. Nor is it always feasible to take traditional approaches to semantic layers as
analytic tool diversity is multiplying.

The semantic layer will have to evolve. Semantic layers contain very valuable, business-
user-facing data. Thus, where this data is stored and calculated and made available is
quite important, as it will have implications for the success of self-service analytics
initiatives, AI/ML initiatives, and data consistency and quality across the organization.

Gartner, Inc. | G00749457 Page 4 of 36

This research note is restricted to the personal use of [email protected].


This evolution is taking several forms:

■ Offering a knowledge graph capability which can link related data, concepts and
definitions together more effectively.

■ Offering a virtual abstraction tier, so that the semantic layer is no longer a physical
store.

■ Offering a set of performance optimization techniques, such as pushdown,


intermediate servers, caching and precomputation, so that the semantic layer is
more performant on a variety of sources and analytics use cases.

■ Offering a set of analytic functions, so that calculations do not need to be custom-


built, but are available in a variety of end-user applications environments.

Semantic Layer Technology Options


Organizations have many technology options to consider in order to realize the benefits of
the modern semantic layer. Some of the most common locations for semantic layers in
modern data architectures are:

■ An A&BI tool’s semantic layer, either via a traditional physically stored OLAP
approach or a more modern virtual semantic tiers.

■ Problem this technology is trying to solve: Unifying a set of reusable business


calculations and data with the analytic tooling required to deliver insight in one
platform.

■ A data warehouse/data mart platform, via manually created materialized views or a


built-in federator/materializer, stored calculations and metadata.

■ Problem this technology is trying to solve: Delivering large volumes of trusted


data to a large number of concurrent analytics users and use cases at the right
price/performance ratio.

■ A stand-alone data virtualization platform, via virtual data models and business
logic.

■ Problem this technology is trying to solve: Connecting to data, rather than


collecting data, in a virtual abstraction tier that enables flexibility, agility,
optimization and data governance.

Gartner, Inc. | G00749457 Page 5 of 36

This research note is restricted to the personal use of [email protected].


■ A data fabric/knowledge graph platform, via the function of the data fabric or graph
store that leverages knowledge graphs and ontologies.

■ Problem this technology is trying to solve: A data fabric is a design concept


that serves as an integrated layer (fabric) of data and connecting processes.
The fabric presents an enterprisewide coverage of data across applications
that is not constrained by any single platform or tool restrictions. The
knowledge graph is a key component of this architecture that enables deep
understanding of the linkage and usage of data and metadata.

■ A data lake or object store enablement/query optimization platform, especially via


lake enablement platforms that enable the profiling, governance, and consumption-
ready views of data lake data, as well as linkage to a data catalog that enables the
search of this data.

■ Problem this technology is trying to solve: Enabling a platform where diverse


formats of new and existing data can be stored, governed, structured and
analyzed for emergent discovery analytic use cases and use cases that require
scale.

This multitude of options leads many Gartner customers to ask a crucial question:
Which of these analytics and data management platforms is the best to use for a
semantic layer for my use case?

To answer this question, it is useful to build a framework to compare semantic layers.

The criteria that you can use to evaluate semantic layers are:

■ Connectivity

■ Model development and sharing

■ Business constructs, calculations and function support

■ Query performance

■ User persona support

■ Security and governance

Gartner, Inc. | G00749457 Page 6 of 36

This research note is restricted to the personal use of [email protected].


These criteria, when used to compare against specific options, result in the following
matrix (see Table 1).

Table 1: Comparing Semantic Layer Options


(Enlarged table in Appendix)

This comparison is a helpful start, and to extend this comparison toward action, the
following list of criteria aligned to each of the categories may be useful to identify which
capabilities you may need.

Connectivity

■ Supports legacy, on-premises data warehouses

■ Supports cloud data warehouses

Gartner, Inc. | G00749457 Page 7 of 36

This research note is restricted to the personal use of [email protected].


■ Supports on-premises and cloud data lakes

■ Supports SaaS data sources (Salesforce, Workday)

■ Supports tools that speak SQL via JDBC or ODBC

■ Supports tools that speak MDX or DAX and live Excel connections

■ Supports custom applications via REST, GraphQL, or programmatic interfaces with


languages such as Python

■ Supports web-based interface for data consumers

Model Development and Sharing

■ Supports web-based development (versus client application)

■ Supports multiple, simultaneous editors for collaborative development

■ Supports reusable objects and model component sharing

■ Supports development life cycle (dev/test/prod)

■ Supports versioning of semantic models

■ Supports other tools in the enterprise data ecosystem, such as third-party data
cataloging platforms and data quality platforms

Business Constructs, Calculations and Function Support

■ Supports temporal intelligence (period over period, period to date)

■ Supports MDX, DAX, pre- and post-query calculations

■ Supports aggregation functions (SUM, AVG, MAX, MIN)

■ Supports nonadditive metrics (distinct count, first, last)

■ Supports live Excel pivot tables and Excel CUBE functions

■ Supports query performance/SLAs via optimization, caching, intermediate servers,


pushdown, precomputes

■ Supports automated query performance improvement recommendations

Gartner, Inc. | G00749457 Page 8 of 36

This research note is restricted to the personal use of [email protected].


■ Supports SQL dialect or database-specific optimizations

Security and Governance

■ Supports single sign-on for all data consumers

■ Supports user impersonation and delegated authorization

■ Support and respects native data platform security constructs

■ Supports row-level security for users and groups

■ Supports column or object-level security for users and groups

■ Supports column hiding and masking for users and groups

User Persona Support

For user persona support, rather than a set of criteria, it is useful to contextualize what
user expectations are when they use a specific analytic platform. See an example in Table
2 of user roles and their expectations using semantic layer platform.

Gartner, Inc. | G00749457 Page 9 of 36

This research note is restricted to the personal use of [email protected].


Table 2: User Persona Outcomes and Feature Expectations
(Enlarged table in Appendix)

However, these criteria, although useful, still paint an incomplete picture of semantic layer
choices in the modern organization. Vast differences exist in semantic layer support for
different target applications, different cloud vendor offerings and deployment models,
different data types, and different functions. As a result, more analysis is needed to
determine exactly where the semantic layer makes sense, and with which technology
approaches and vendors.

Analysis

Gartner, Inc. | G00749457 Page 10 of 36

This research note is restricted to the personal use of [email protected].


Evaluate Semantic Layer Placement Options
As shown in the comparison, there are a variety of technology approaches to semantic
layers. These approaches also intersect with the variety of locations in the data pipeline
where you can place the semantic layer. This scenario will be familiar to technical
professionals who are deploying the logical data warehouse architecture. Similar
guidance applies as well, that although semantic layer placement matters to a use case,
for the scope of organizational use cases, it is often useful to have semantic layers sit in
several different parts of the D&A architecture.

The traditional semantic layer, linked to traditional A&BI tools, works as a data mart,
offering a layer of logic in conjunction with a store of analytics-ready data with the
context to support self-service by unskilled users. However, data collection can only scale
so far. As a result, new approaches that focus on connecting to data have become
popular.

This understanding of connecting versus collecting data is important as technical


professionals consider where to place their semantic layer. Many organizations are
attempting to build end-to-end D&A architectures. To do so, organizations have broken up
the data management process into Acquire, Organize, Analyze and Deliver phases as
shown in Figure 1. In this end-to-end architecture, organizations have three major choices,
and one unrealistic option (instant semantic layer at the point of ingestion) that they must
consider as they decide where to build the semantic layer.

Gartner, Inc. | G00749457 Page 11 of 36

This research note is restricted to the personal use of [email protected].


Figure 1: Semantic Layer Placement Options

Figure 1 describes the following four possible placement locations for semantic layers:

Option 1: Autogenerated semantic layer based on source data. This scenario describes a
semantic layer that is autogenerated during the Acquire stage, the moment data arrives
from various sources. Some vendors, such as Oracle, SAP, IBM and Microsoft, have built
analytic platforms on top of their ERP and CRM platforms to enable reporting off of data
in those systems and to detect data and metadata that can feed into function in the
semantic layer. However, this solution rarely is complete for organizations, who also need
data from other places. It is currently unrealistic to create a perfect, instantly generated
semantic layer that requires no customization or modification, especially given that most
of these vendor offerings are not being supported as the future of analytics. Too many
organizations are moving to a diverse multivendor environment to make this a realistic
modern solution at the ingestion layer, as this would require standardization by a single
vendor on a single set of limited analytics capabilities.

Gartner, Inc. | G00749457 Page 12 of 36

This research note is restricted to the personal use of [email protected].


Semantic layer technology options: Not realistic.

Option 2: Semantics at LDW. This scenario describes a semantic layer in the Organize
stage (the LDW). The logical data warehouse is designed to satisfy 95% of analytics
requirements. LDWs support a broad set of analytic engines that can support a wide
variety of users and applications. For that reason, placing the semantic layer in the LDW is
often optimal. The LDW itself, however, is composed of multiple component parts, making
a single universal semantic layer on top of all of them unlikely to be possible. Moreover,
although the LDW maximizes flexibility, building it up to that point takes time and effort,
and analytics platforms are often easier to develop prototypes on.

Semantic layer technology options: Graph database/data fabric, data virtualization, data
warehouse, data lake enablement platform.

Option 3: A&BI semantic layer. This scenario describes a semantic layer with a local,
optimized data store in the Analyze stage. This is an approach that generally requires
collecting data, so that a high-performance, in-memory mart can optimize user access to
data, user data preparation or augmented analytics. This can be a very performant
solution for specific departments, or for standardized enterprise reporting. Because of its
location inside an A&BI application, it is likely to be a siloed solution, slow to change and
hard to standardize across A&BI platforms. Moreover, performance limitations may be
encountered rather quickly for high volumes of data. A middle ground between placing the
semantic layer in the LDW and in the A&BI tier may be using the SQL Interface, or as this
research describes it, the data lake enablement platform, to put the semantic layer at the
edge of the LDW. This gets around some of the silo and performance issues since SQL
interfaces for data lake enablement often offer many query optimization functions.

Semantic layer technology options: A&BI semantic layer tools.

Option 4: Shadow semantics. This scenario describes building a semantic layer, in an ad


hoc fashion, in the Deliver phase (i.e., on users’ desktops, individual BI workspaces and
Excel spreadsheets). This is akin to creating a shadow analytics layer, where business
units define meaning on their own, and build inconsistent data semantics. This is
suboptimal, because by not building a semantic layer that is widely integrated into their
data sources and analytics platforms, organizations often find themselves poorly
equipped to handle analytics use cases that require the data to be trusted, consistent and
governed.

Semantic layer technology options: Not recommended.

Gartner, Inc. | G00749457 Page 13 of 36

This research note is restricted to the personal use of [email protected].


There are several trends that have added additional complexity to these semantic layer
decisions

1. The continued proliferation of self-service analytics tools. The traditional single


vendor A&BI stack has been replaced by a plethora of different A&BI platforms in
most organizations. This diversity of tools increases the desire for a single semantic
layer to ensure consistent data shows up for any user of any tool. However, few
semantic layers exist that work in direct query mode with a large number of target
A&BI tools with rich security, governance, connectivity capabilities. Moreover, most
A&BI vendors recommend building models and business logic into their own tool to
deliver maximum flexibility to customers.

2. The questions around semantic layers cloud data warehouse and lake architecture.
Cloud vendors are engaged in a campaign to dominate future data architecture
investments, and they are making significant progress with this goal. However, these
cloud data architectures are being built with semantic layer functions as an
afterthought, which is leading many organizations to consider a couple of options.
Building views in the data warehouse and business logic in a programming
language like Python to replicate a semantic layer experience in the DW. Building
semantic layer information into the data lake, especially using data lake enablement
tools like Dremio or AtScale, which also offer support for business semantic layer
calculations. DW-based approaches tend to be a lot of work to build and maintain
views and stored procedures. Data lake based approaches tend to both be a lot of
work and a lot of expense, as enablement tools require significant investment in
both time and money to deploy properly.

3. The continued relevance of business application vendors in the modern data


architecture. Although organizations are interested in building data and analytics
architectures in the cloud, they continue to struggle with questions about which
cloud, as business application providers offer compelling benefits for data and
analytics deployments that link to their applications. Moreover, tight integrations
between business applications, their accompanying data management systems and
analytics tools weaken as you try to connect those data sources to cloud-based
analytics tools from a different vendor.

Gartner, Inc. | G00749457 Page 14 of 36

This research note is restricted to the personal use of [email protected].


4. The unclear potential of data virtualization, and graph/data fabric technology.
Many organizations are interested in cutting through this complexity with a powerful
modern technology such as data virtualization or graph/data fabric. However, these
powerful tools for integration, analytics and data management are either still
developing into support for semantic layer use cases (in the case of data
virtualization) or relatively rare and requiring unique skill sets (in the case of
graph/data fabric). Also, no data fabric can be built with a single vendor offering.
Thus, the data fabric is less achievable than vendor marketing indicates it is. See the
Hype Cycle for Data Management, 2021 for more information on this trend.

A Deeper Dive Into Semantic Layer Options


Selecting a semantic layer platform is an exercise in prioritization, because there is no
perfect solution. This section analyzes some of the leading options.

Analytics & BI Semantic Layers


For the purposes of this analysis, legacy A&BI semantic layers and modern A&BI semantic
layers are different enough to be broken into two categories.

Legacy Semantic Layers Linked to A&BI Platforms

Example vendors: IBM, Microsoft, Oracle, SAP

Why Technical Professionals Choose to Build Semantic Layers Using These Technologies

Because of the mature, incumbent nature of many of these platforms, many


organizations reap a lot of value from these semantic layers. They have a long-term,
existing investment in A&BI built upon one of these platforms in scenarios where:

■ Ad hoc query is enabled by Oracle Business Intelligence Enterprise Edition (OBIEE)


answers.

■ Paginated reports are generated by IBM Cognos.

■ Embedded analytics dashboards have been built in MicroStrategy.

■ A set of enterprise dashboards underpinned by data gathered in SAP Business


Objects (BO) Universe.

■ OLAP cubes are built in Hyperion that are provided to finance departments for
business-unit-critical, self-service analysis.

Gartner, Inc. | G00749457 Page 15 of 36

This research note is restricted to the personal use of [email protected].


■ Azure Analysis Services has been implemented in order to provide an in-memory
semantic layer for many concurrent users of Power BI datasets.

Semantic layer platforms of this generation continue to have staying power. They
function well as information portals, and because of the maturity of these platforms and
the lack of innovation in the space, they continue to offer value to organizations who need
pixel-perfect reporting options. These systems often have advantages in terms of
management and publication flexibility that only become apparent when migrating off of
these platforms. For example, there continue to be differences between tabular and
multidimensional analysis services models, which can present challenges for
organizations who are looking to evolve their SSAS multidimensional estate (see
 Comparing Tabular and Multidimensional Solutions, Microsoft).

Why Technical Professionals Move Off of These Platforms

■ They are difficult for end users to learn — semantic layers built into traditional A&BI
platform were often complicated to use, leading to a scenario where not many users
in the organization were able to take advantage of these self-service capabilities.

■ They are difficult to set up and maintain — technical professionals must be experts
to set up and maintain a semantic layer connected to a traditional A&BI tool.

■ They are IT managed, offering few opportunities for collaboration with the business
— because everything is centrally managed, users don’t have much of an opportunity
to contribute to improving the platform. As a result, semantic inconsistencies can
become frighteningly common.

As a result, semantic layer-based traditional BI tools were becoming big, static, slow-to-
change monoliths. For example, business units wanted to experiment and innovate, but
the change process for semantic layers required a change request to go to IT, and for IT to
make that change.

What Technical Professionals Should Consider Doing to Optimize This Approach

Technical professionals should closely evaluate the use cases for these technologies in
their organization, and in particular, the vendor’s dedication to platform investment and
innovation. Many of these A&BI platforms are not receiving a large amount of innovation
compared to the SaaS, self-service BI platforms that megavendors are building to replace
their traditional A&BI offerings.

Gartner, Inc. | G00749457 Page 16 of 36

This research note is restricted to the personal use of [email protected].


Technical professionals who want to extend the life of these semantic layer-based A&BI
platforms should consider connecting the semantic layer of these products to popular
self-service BI products like Tableau, Power BI and Qlik. In some cases, third-party
products like BI Connector offer connectors from traditional semantic layers in OBIEE to
vendors like Power BI, Tableau and Qlik. In other cases, vendors, such as MicroStrategy,
build connectors so that Power BI, Tableau and Qlik can consume their semantic layer,
and at the same time benefit from the modern A&BI offering that MicroStrategy has rolled
out. Finally, analytics hub options like Metric Insights, Digital Hive and SAP Analytics Hub
enable users to view all of their different BI tool dashboards in one place, and in one email
report every morning. This can serve as an on-the-glass integration that can keep these
dashboards relevant, as for many users, self-service means, on-demand report generation.
Such scenarios and toolsets enable organizations to extend these capabilities in a
modern way.

Primary benefits: Robust, mature semantic layer platforms that enable ad hoc exploration
by trained business users. These products, like OBIEE, IBM Cognos Framework Manager
Model, and Online Analytical Processing (OLAP) Platforms such as SAP BO Universe,
Microsoft Analysis Services and Oracle Hyperion, have been satisfying enterprise
analytics use cases for a long time and, for many organizations, they continue to do so.

Primary issues: These platforms are part of a mature market, and for the most part, the
rate of innovation has slowed significantly. In addition, the proprietary data architecture
that these tools use mean that they do not integrate well with third-party A&BI or LDW
architectures without significant customization, and self-service capabilities are
circumscribed.

Modern Analytics and BI Platforms


Modern A&BI Vendors With Augmented Analytics

Example vendors: Infor Birst, IBM, Microsoft, MicroStrategy, Oracle, Salesforce, SAP,
Sisense, Tableau, ThoughtSpot, Qlik and Yellowfin

Modern analytics and BI platforms, which integrate augmented analytics vendors, offer
compelling capabilities. They offer a modern version of the A&BI semantic layer: Load
your data into our platform, and bring your logical data model, measures and metrics into
our platform, and in return, we will provide you with flexible prototyping and promotion
capabilities, as well as augmented analytics capabilities. The augmented analytics vendor
will allow you to query data with natural language and take advantage of automatically
generated insights. It will also allow you to have powerful modeling capabilities inside the
platform.

Gartner, Inc. | G00749457 Page 17 of 36

This research note is restricted to the personal use of [email protected].


Why Technical Professionals Choose to Build Semantic Layers Using These Technologies

In some organizations, there is a strong desire for a shortcut to highly accessible, usable,
self-service analytics that don’t require a large investment in multiple data platforms.
Often, these organizations are laser-focused on delivering the most dynamic, AI-
augmented capabilities for A&BI. These might include the capability to do the following:

■ Automatically identify, visualize and narrate relevant findings in data.

■ Explore data using natural language query, chatbots and voice.

■ Access powerful analytics capabilities to identify correlations, clusters, exceptions


and predictions, without having to have a background in data science. In general,
enable less-skilled business users and SMEs to have more power to do the analytics
they need in their role, without having to develop specialized technical knowledge in
this area.

However, the complexity of bringing AI and/or ML into production includes the challenge
of optimizing compute, memory and storage resources for these deployments, as well as
the challenge of getting the sufficient volume of trusted, fit-for-purpose data into these
systems. To deliver on the augmented analytics promise, many vendors pursue a strategy
where the data is brought into their proprietary data architecture first, then build semantic
relationships between data fields, exposing rich AI-driven augmented analytics.

The fact that the vendor provides some automodeling and data-refinement capabilities,
coupled with a networked semantic layer, means that organizations can skip a lot of the
slow manual steps of modeling their data and gain efficiencies, moving their analytics
into production. However, it is important to note that vendor platforms may not always
provide accurate or current automated features. User feedback is a necessary input to
identify whether the AI/ML enabled functions of these tools have functioned.

Modern analytics and BI also enables self-service by providing a flexible semantic layer
for analytics inside the BI platform. This allows a central data model to be created and
maintained, while enabling users to easily prototype new dashboards and mashup new
data sources, and then easily promote these datasets back into the central data model.

Gartner, Inc. | G00749457 Page 18 of 36

This research note is restricted to the personal use of [email protected].


These are compelling advantages for any organization that has faced the difficulty of
trying to do this via the previous version of semantic layers. In one swoop, self-service and
collaborative semantic layer management are made more intuitive. On top of this,
because many of these solutions, like Birst and Tableau CRM, are now part of the
combined offering portfolio of vendors with ERP or CRM offerings and their A&BI products
can offer highly tuned, pretrained models for decision support around business processes.

Why Technical Professionals Move Off of These Platforms

One problem with building a semantic layer inside an augmented analytics A&BI platform
is that it is typically proprietary, and not very integratable with other BI tools or other data
sources. So the solution may work quite well, until there is a business requirement to use a
different A&BI tool or move the data to a different location. In those scenarios, the
organization is typically looking at a time-consuming migration. If this is built inside a
cloud vendors’ platform, there may also be egress charges for the data that moves out of
the particular cloud vendor.

What Technical Professionals Should Consider Doing to Optimize This Approach

Organizations should balance the benefits of augmented analytics to satisfy some


analytics requirements against the lock-in and potential challenges of satisfying other
analytics requirements that require other databases or A&BI platforms.

Evaluating these modern analytics initiatives from a project life cycle lens, that
incorporates return on investment, can help organizations to identify when the shiny new
tool is at risk of quickly becoming technical debt, and whether a more efficient approach
is available. Moreover, technical professionals should consider viewing the semantic layer
that they build inside these A&BI platforms as a local semantic layer, not a long-term
global semantic layer.

Technical professionals should also look at Query-Focused A&BI Tools, such as Google
(Looker), AnswerRocket, and Sigma Computing, if they are making a significant
investment in data warehousing. Query-focused A&BI tools can be a great option for
adding a query and modeling layer to enable citizen data scientists and technical power
users to access data, especially data sitting across several cloud and on prem data stores,
and to create new datasets and visualizations on top of them. Their strength is that they
handle everything via query connections, which can deliver significant flexibility to skilled
users.

Gartner, Inc. | G00749457 Page 19 of 36

This research note is restricted to the personal use of [email protected].


Primary benefits: These platforms are good for analytics use cases where self-service is
prioritized over duplication of efforts, because very few organizations over a certain size
will deploy an A&BI platform like this as a replacement for an LDW. They are often well-
suited to organizations that have a special requirement for analytics, such as embedded
analytics, integration with CRM or ERP vendor data, or optimizing search-based analytics.
These tools have a lot of potential to increase analytics adoption by analytics consumers,
by offering an intuitive interface for interacting with data and autogenerated data
insights. By building augmented analytics on top of a curated in-memory data store, A&BI
tools can increase the sophistication of their AI capabilities. This includes more robust
natural language search interfaces, autogeneration of visualizations and customer-
focused insights.

Primary issues: Proprietary data architectures do not integrate well with third-party A&BI
or LDW architectures. For example, natural language search capabilities usually require
data to be ingested and prepared inside the A&BI platform before they are used. These
features are still maturing, and often lack enterprise-grade administration and deployment
features. Infrastructure complexity and data integration complexity is a common
challenge. For more information, see Using Augmented Analytics to Boost A&BI.

Data Warehouses (With Support From Data Warehouse Automation Platforms)


Example vendors: (DW) Oracle, SAP, Amazon Web Services, Microsoft, Google, IBM,
Snowflake, Teradata, (DW Automation) WhereScape, Kalido, BI Builders, AnalyticsCreator,
Incorta, Qlik

Data management is a critical component in the modern digital enterprise, and data
warehousing continues to remain the most pragmatic way to process large and complex
datasets for timely and trusted insights. Thus, data warehousing continues to be an
important component of the logical data warehouse.

Increasingly, cloud data warehouses offer a wider variety of capabilities and deployment
options to support modern data storage and processing requirements. These platforms
are often offering more competitive licensing and the same support for low latency data
access with high concurrency. Cloud offerings are being built with embedded AI/ML to
manage the warehouse more efficiently, improved ease of use, more on-demand pay-per-
usage pricing models, and to deliver faster throughput, performance and scale to the
growing demand of your organization.

Why Technical Professionals Choose to Build Semantic Layers Using These Technologies

Gartner, Inc. | G00749457 Page 20 of 36

This research note is restricted to the personal use of [email protected].


In the context of semantic layers, organizations that are investing in modern DW
capabilities are often looking for an efficient approach to semantic layer design, and
building data warehouse views, with the associated business logic and calculations
embedded in languages like Python, provides a viable and useful option for a semantic
layer that the central A&BI team can maintain.

Why Technical Professionals Move Off of These Platforms

Although data warehouses are unlikely to be retired in an organization, the use of a data
warehouse as a semantic layer has some limitations that apply especially to older, legacy
warehouses that don’t use LDW, MPP technologies, or modern techniques such as
collaborative workspaces. Thus, these limitations are not true of all data warehouses, but
apply commonly enough that the point is worth making:

■ Building many, many views doesn’t scale, and can involve a lot of development work
by a relatively small team.

■ Data warehouses are not end-user-friendly platforms, and thus the end user does not
have the opportunity to be involved in the process unless the DW team brings them
in and helps them get involved.

■ Direct query against models that may sit in a data warehouse can be quite slow, if
the queries have not been optimized according to the requirements of both the A&BI
tool and the data source. In scenarios where the A&BI tool and the data warehouse
are in different clouds, laws of physics still apply, and only limited query
optimization is usually possible.

■ The calculations and business logic required for a semantic layer to work are not
always natively available in the data warehouse, necessitating some additional
development of this logic, often in a different programming language like Python.

What Technical Professionals Should Consider Doing to Optimize This Approach

Technical professionals should follow LDW best practices and seek to deploy DW
strengths to use cases that require low-latency high concurrency data access. Moreover,
they should consider DW and A&BI tool integrations before they make data warehouse
and A&BI tool decisions.

Gartner, Inc. | G00749457 Page 21 of 36

This research note is restricted to the personal use of [email protected].


Given that poorly integrated offerings are difficult to fix after the fact, technical
professionals should rely on vendors to fix these integrations, rather than try to do it
themselves. Technical professionals should also be aware that automation and DevOps
practices, enabled by DW automation tools, can deliver much faster and more usable DW
changes, migration and implementation than more manual approaches.

Primary benefit: Building views on top of data warehouses or virtual mart environments
inside data warehouses, is a great (and affordable) way to deliver trusted data for
compromise analytics models. Virtual marts can also be useful for candidate and
contender approaches. The number of concurrent analytics users is relatively modest and
the number of marts to be managed can serve as a great, central way of managing a
semantic layer.

Primary issues: Platform efficiency tends to be quite high, but these are not typically very
end-user-friendly platforms. As a result, environment development hinges on skilled data
engineer roles. They are not usually the target of initiatives to broaden self-service,
although their rules can significantly benefit from the capabilities of the DW automation
space.

Data Lakes and Data Lake Enablement Platforms


Example vendors: (Data lakes) Cloudera, Google, Microsoft, Amazon Web Services and
(data lake enablement platforms) AtScale, Cambridge Semantics, Dremio, Kyligence, Kyvos
Insights

Organizations that know they want to collect diverse forms of data that reside in multiple
formats and locations across the enterprise, and then analyze this data for discovery
analytics use cases. They are realizing that traditional data warehousing technologies
can’t meet all their new business needs, causing many to invest in a scale-out data lake
architectural pattern.

The point of a data lake is that its simplicity enables broad, flexible and unbiased data
exploration and discovery. This is important because advanced forms of analytics (such
as data mining, statistics and machine learning) usually involve data exploration and
discovery. Unbiased exploration and discovery is, in fact, impossible when the bias of
optimization has been placed on that data (as is done in the data warehouse).

Unlike a data warehouse, a data lake preserves the original details of source data for the
richest data exploration, discoveries and analytic correlations possible.

Gartner, Inc. | G00749457 Page 22 of 36

This research note is restricted to the personal use of [email protected].


Why Technical Professionals Choose to Build Semantic Layers Using These Technologies

Building a semantic layer on top of a data lake removes some of the classic problems of
lakes: understandability, performance and SQL access, and in doing so makes a data lake
more like a lake house that is ready to do any analytics more easily and reliably.

Technical professionals who are thoughtful about data lake design often make some key
observations. The first is that the data lake should complement, not replace the data
warehouse. For example, most data warehouses were designed for reporting and older
forms of analytics. Most data lakes are built to enable new and advanced forms of
analytics, which are essential to digitalization and innovation. Business reporting
demands deeply optimized data with a detailed audit trail, whereas advanced analytics
and data science demand massive volumes of detailed source data for extreme data
exploration and discovery analytics.

This is why data stored in a lake differs sharply from data stored in a warehouse,
although the data for the two can come from the same sources. Hence, an architecture
that integrates a data lake and a data warehouse is capable of supporting a broad range
of business use cases and data management strategies. However, advanced data science
users shouldn’t have to spend their time on data engineering in order to create advanced
analytics off of data in these sources. Moreover, citizen data science users also need
access to data that may be hosted in a data lake.

This makes data lake enablement investments essential to deliver SQL query support on
top of data lake data. Some technical professionals also adopt data lake technologies
because cloud vendors tell them to do so, as the cloud vendors strategy involves data
ingestion into their low-cost data lake, then the provision of data and analytic services on
top of that now cloud-native data.

Why Technical Professionals Move Off of These Platforms

Data lakes can be very challenging to implement. Often, the data lake platform has not
structured data sufficiently to make it understandable and useful. Incomplete lake
governance creates a data swamp, full of m large amounts of raw or uncurated data. The
reality is that only a handful of staff are skilled enough to cope with such data, and they
are likely doing so already.

Gartner, Inc. | G00749457 Page 23 of 36

This research note is restricted to the personal use of [email protected].


Another of the reasons data lakes are difficult to be used in this context is that it is
difficult to define cohesive governance and security policies across a series of very
different datasets residing on a single cluster of physical infrastructure. The same
attempt was made with data warehouse implementations, but proves far less successful
with data lakes because the data they contain isn’t modeled. Creating policies for data
without context is impossible. Creating a semantic layer on top of that data is also quite
challenging, for the same reason.

The third assumption is that data lake implementation technologies perform far better
than they actually do, which leads to wild overestimations of their benefits.

The biggest drawback to data lakes is the lack of context, which the semantic layer and
data catalog try to provide. The challenge though is that given the necessity of semantic
layers as a way to provide context and consumability to data in the data lake, semantic
layers are rarely being implemented to cover every potential analytics use case. The
semantic layer must first be built to make the data lake usable. In the semantic layer
context, as business friendliness, data governance and performance are prerequisites to
success, this means that data lakes can be quite challenging to deploy for this use case
without significant investments in the platform, and supporting a highly ambitious and
broad semantic layer on top of data in the data lake isn’t a very common scenario among
customers.

What Technical Professionals Should Consider Doing to Optimize This Approach

The key take-away for technical professionals from this section is to recognize the role
that data lakes play in the data analytics architecture as a data store that is optimized for
novel and discovery analytic use cases, as well as a data store that is affordable.
Technical professionals should exercise caution attempting to replace every analytics use
case with the combination of a data lake and a data lake enablement tool. Building Data
Lakes Successfully — Part 1 — Architecture, Ingestion, Storage and Processing, and
Building Data Lakes Successfully — Part 2 — Consumption, Governance and
Operationalization, should be part of your reading list.

Gartner, Inc. | G00749457 Page 24 of 36

This research note is restricted to the personal use of [email protected].


Primary benefit as a semantic layer: Cloud vendors such as Amazon, Microsoft and
Google have made investments to market the viability of a data lake as the place to host a
shared, reusable semantic layer. Storage is inexpensive, meaning that data lakes can be
adapted for many purposes, and in this context, that purpose can be to become a storage
location for a large amount of organizational data, connected to a consistent semantic
tier. Data lake enablement products like AtScale, Cambridge Semantics and Dremio can
enable ease of query, and access and data connectivity to A&BI tools. Data lakes are often
associated as a good use case for contender analytics, but even this form of analytics
often requires the data catalog and data governance to be optimized.

Primary issues: Data lakes require significant work to get them ready for production. This
includes governance, data profiling, data tagging, data security, metadata management,
data wrangling and data integration. As a result, their viability as a semantic layer is often
a long-term goal, not a short-term reality. Only some cloud vendors have taken the
approach of enabling a data lake as a semantic layer (in conjunction with other cloud
services) that is ready to interact with BI.

Data Virtualization
Example Vendors: Denodo, Dremio, Intenda (Fraxes), Data Virtuality, IBM, Informatica,
Oracle, SAP and TIBCO Software

DV, either offered stand-alone or via a data integration platform, has been of interest to
many organizations because of its ability to provide a semantic layer tied to a layer of
access, federation and/or virtualization. With data virtualization, combining different
sources of data to execute federated queries is possible without replicating it into
persistent storage — also without the creation of a physical cube or star schemas. This
leads to increased agility.

Why Technical Professionals Choose to Build Semantic Layers Using These Technologies

Data virtualization offers the ability to create a virtual model of data that joins relational
and nonrelational data, from many sources, on-premises and/or in the cloud. It simplifies
data access for analytics, improves reuse, reduces change impact and helps to achieve a
consistent, reusable semantic layer.

Gartner, Inc. | G00749457 Page 25 of 36

This research note is restricted to the personal use of [email protected].


Data virtualization technology is evolving to address a common set of technical
professionals’ data integration challenges. DV gives end users and data integration
technical professionals more options to connect to data, to access data more quickly, and
to prototype and experiment with novel data combinations and use cases. By contrast,
traditional bulk/batch data integration styles demand upfront requirements specification
and curation of data before the data product is actually delivered. Technical professionals
often would like to create a semantic layer for users that can abstract away the
complexity of the underlying data architecture. Data virtualization tools offer enticing
capabilities to deliver on that use case, as well as several others that emphasize agility.

Why Technical Professionals Move Off of These Platforms

Although data virtualization tools have been marketed as replacements to semantic


layers, in practice the investment in appliances and infrastructure required to allow stand-
alone DV tools such as AtScale, Denodo, Data Virtuality and TIBCO Data Virtualization to
serve as a semantic layer is significant and expensive in terms of both time and money.
Moreover, data integration teams often wonder whether DV is the best long term solution,
or whether DV fits a niche for agility and use cases that do not allow for data movement,
and not much else.

The reality is although DV can support many use cases, it is not ideal for every use case.
Semantic layers typically require blazingly fast performance that can be difficult to deliver
without serious caching. Semantic layers also require significant target application
support that not every DV vendor has built.

What Technical Professionals Should Consider Doing to Optimize This Approach

DV has a lot of potential, especially for agile analytics prototyping as well as unifying
disparate data sources, but DV tools have not yet become platforms, for the most part, for
the entire organization to use. Some workflows inside the DV platform may be business-
user-friendly, like searching a catalog, but not all of them.

Thus, building a semantic layer on top of DV technology will still be a process that
technical professionals must manage, and they must adapt their approach to build virtual
views instead of ETL and physical consolidation. DV also provides benefits for
performance optimization via caching, which can make semantic layers more performant
than it may at first appear, as the cons of additional layers of abstraction are balanced out
by the pros of additional query optimization.

Gartner, Inc. | G00749457 Page 26 of 36

This research note is restricted to the personal use of [email protected].


The cost of DV compared to the benefits derived for the semantic layer use case limit the
adoption of this platform for all semantic layers, as the business value and use case must
be compelling to justify the investment in this technology alone. DV is likely to be most
useful when there are a large collection of analytical engines, such as those you might
have in the LDW, and you need a tool to build semantic layers and other connections on
and between them.

Primary benefits: DV unifies views of data, and consolidates business rules and logic. DV
provides one logical location to access all data that has been connected to using DV
technology. This greatly simplifies data management for both IT and analytics platform
users, especially when this use case is extended to support a semantic layer that includes
business definitions, measures, metrics, rules and logic.

DV can federate queries among varied sources, and can help organizations handle
multicloud architectures without expensive copying and moving of data. This on-the-fly
ability does not require ETL planning or procurement of physical hardware for a data
mart. Rather, DV can federate a query even among disparate sources in terms of type,
location and size. All that is required is a common key — even if the keys are represented
as synonyms.

Primary issues: For DV to work, it must use defined datasets. DV will suffer or fail if the
datasets connected are not properly defined and understood. The data must be
represented in a defined, standardized form or be able to be put into a compatible and
well-understood form. Combining such data cannot be accomplished without
standardizing the data first. Standardization may require data transformation.

Additionally, DV transformation capabilities are often limited. Moreover, data virtualization


can analyze access patterns and allow intelligent algorithms to optimize DV performance,
but some access patterns will not be suited to DV. Sometimes, the best answer is a
balance between physical consolidation and DV. Physical consolidations reduce the
number of platforms to be virtualized, thus simplifying access plans. Large
consolidations also provide powerful massively parallel processing (MPP) platforms that
the DV tool can make use of for pushdown processing.

Graph Data Store/Data Fabric


Example vendors: Cambridge Semantics, CluedIn, Denodo, Informatica, Semantic Web
Company-data.world, Stardog, Talend

Gartner, Inc. | G00749457 Page 27 of 36

This research note is restricted to the personal use of [email protected].


Note that because this is a nascent technology, technical professionals have not adopted
this en masse; thus, the format of this section will be different. But there is an argument to
be made that the graph data store and data fabric are a strong potential future for the
semantic layer.

A data fabric maps data residing in disparate applications (within the underlying data
stores, regardless of the original deployment designs and locations) and makes them
ready for business exploration. Connected data enables dynamic experiences from
existing and newly available data points leading to timely insights and decisions. This is
very different from a static experience with reports/dashboards. There are multiple
technology components, but one of the most relevant to the semantic layer topic is the
idea of a knowledge graph. In theory, semantic layers are a great use case for a
knowledge graph and data fabric, as it fits several of the criteria of a graph use case.

Why Technical Professionals Choose to Build Semantic Layers Using These Technologies

The semantic layer is a place where users may need to define new relationships, new
calculations, new connections, new constructs of data, and traditional relational models
can make this more difficult. Data fabrics built on graph data stores have potential when
there are a lot of relationships in the data and there is a need to understand the obvious,
hidden and latent relationships in data. Moreover, they excel when there is a need to
rapidly traverse a large number of relationships to unearth insights and patterns and
complex rules that need to be computed quickly.

Semantic layers should be able to support analytics calculations and requirements even
as they become more sophisticated and demanding, and graph analytics can uniquely
support functions like link analysis, and also provide insights into how semantic model
link to different types of users and use cases in a more flexible way than traditional
relationally modeled data.

Why Technical Professionals Move Off of These Platforms

Gartner, Inc. | G00749457 Page 28 of 36

This research note is restricted to the personal use of [email protected].


Put simply, data fabrics are more of a future architecture than a present, proven reality, as
no single vendor provides a service that supports the entire data fabric, and many
organizations are still trying to connect different analytic engines together in the logical
data warehouse architecture. Thus, there is limited ability for many technical
professionals to successfully position and build a next-generation architecture when the
current architecture is still in progress. Moreover, the skill set requirements and technical
complexity of using graph databases is often beyond the reach of the present technical
professionals team in an organization, and thus they must upskill and or hire a graph
expert to succeed. Graph analytics skilled technical professionals are rare. Thus, a novel
platform, with novel skills required, as well as a radical change from the existing
architecture, is often tough to get approved.

What Technical Professionals Should Consider Doing to Optimize This Approach

Deploy graph data stores and data fabrics for use cases with undeniable and
transformative business benefits. For example, fraud analytics, or life sciences data
fabrics. Piggyback on these proven use cases to also offer a graph-based semantic
model, which can provide an enriched user experience.

Adopt graph when the data has high variability that does not fit well in a two-dimensional
data model of rows and columns. Adopt graph when your current system does not scale
or perform because of slow joins in Relational DBMS systems, and when the data model
is prone to change. Finally, adopt graph when there is a need to integrate disparate
heterogeneous data sources and when there is a need to link data to metadata —
versioning, provenance, lineage, data validation, data discovery and data quality checks —
and to identify anomalies.

Primary benefits: Imagine being able to visualize a cluster of every calculation and KPI
and see what the shared source tables are. Graph analytics can excel when requirements
are to visualize clustering or groupings of data based on connections or patterns, and
when there is a need to quickly compare new data to existing data for possibly merging
data

Primary issues: The biggest issue with semantic layers built on graph and data fabric
stores is that it is really hard to find any reference customers doing this currently. So
organizations adopting this may be on their own. At present, graph technologies may
provide additional context, but to provide that they need significant upskilling, effort, and
integration. That is often a reach for already busy technical professionals.

Gartner, Inc. | G00749457 Page 29 of 36

This research note is restricted to the personal use of [email protected].


Having completed the analysis, it should be clear that no analytical engine is a perfect
solution for semantic layers, but there are some clear pros and cons. Technical
professionals are thus often asking for guidance: what to do about semantic layer
selection.

Guidance
The New Semantic Layer Is Not Going to Be Universal, or Single Engine
The purpose of the original single-technology, single-server data warehouse coupled with
the single-technology semantic layer was clear: To enable usable analysis that spanned
all of an organization and across time. This was perhaps overly ambitious: No one
technology has come to surpass all the others in the area of analytics and no one
technology is capable of handling all requirements. It is now clear that the optimal
analytical platform is an integration of multiple technologies and data stores.

As much as organizations want to create a unified layer of access to their environment,


the multiengine nature of the LDW will complicate this. As a result, organizations should
think about how to implement a semantic layer that will scale down for local use cases,
and scale up for global deployments. This semantic information may reside throughout
their A&BI, data integration, data lake, data warehouse and DV tiers.

The original enterprise data warehouse (EDW) and the traditional semantic layer have not
achieved their original goals because of the growth in diverse users, use cases, data types,
data velocity and data volume in the organization.

The original aims of data warehousing and an associated semantic layer remain valid.
These were to provide a broad, and historically deep, shared view of everything that is
going on within an organization. The semantic layer could take that view and deliver it to
the users in familiar terms. Data warehouses did this by integrating and keeping the data
generated by the organization’s business processes. Semantic layers did this by providing
the metadata, definitions and context so that data was easy to understand and didn’t need
much preparation or transformation.

Gartner, Inc. | G00749457 Page 30 of 36

This research note is restricted to the personal use of [email protected].


What changed was the variety of data available, types of analysis possible and the
requirements. A unified view, with unified semantic logic is still desirable, but a single
physical processing engine cannot provide it, and a single semantic layer does not yet
exist. The integration needs to be logical, not physical. Additionally, the semantic layer
must combine both global, static data definitions with local, changeable custom data
definitions. The semantic layer future is thus a multiengine, connected future, which
leverages the best qualities of the engines described in the previous section and links
them together.

Global Versus Local Semantic Layers


Just as different workloads lend themselves to different parts of the architecture, different
semantic layers emphasize more global data versus more local data. Organizations will
need to try to achieve two outcomes with their data.

The first is to attempt to integrate and organize a more centralized semantic layer to
satisfy broad, shared use cases. This is the essence of a global semantic layer.

The second outcome is that there must be flexible, local semantic layers to give data
innovators the space to create new datasets and analytics, and do so without forcing
them to conform to data management and quality best practices from the outset. This is
the essence of a local semantic layer.

However, semantic layers should be deployed globally, as quickly as possible after the
analytics prototype is generated, promoted and certified to be valuable. This enables the
organization to reuse this insight, and also reduce the complexity of managing diverse
data models and the data hosted in various systems in the cloud and on-premises.

Base Semantic Layer Decisions on Analytics Styles


Technical professionals should evaluate technologies against the goals of their analytics
initiatives. Different semantic layers have affinities with different styles of analytics (see
Table 3). In many scenarios, you will need to combine multiple styles for success.

Gartner, Inc. | G00749457 Page 31 of 36

This research note is restricted to the personal use of [email protected].


Table 3: Semantic Layers and Analytics Styles
(Enlarged table in Appendix)

Lower Your Own Expectations to Account for the Gap Between the Ideal Semantic Layer
and Reality
Recall that in self-service, organizations need to optimize against many goals. Good self-
service architectures are outcome-oriented, valuable, easy to learn, accessible, safe and
trusted. In line with this, the ideal semantic layer needs to deliver against many self-
service goals. However, the reality is often different from the ideal. You’ll be looking for the
right tool for the job, rather than a panacea, because there are many existing limitations.

The ideal semantic layer should have the following characteristics:

Gartner, Inc. | G00749457 Page 32 of 36

This research note is restricted to the personal use of [email protected].


■ Ease of access and use: Semantic layers need to be easy for users to access and
use. Folder structures are a start, but good semantic layers should be searchable,
linked to rich metadata, visualize relationships in an intuitive way and offer
collaboration capabilities.

■ Why semantic layers don’t live up to this ideal: Achieving good integration
between various components of a semantic layer platform is mostly still a pipe
dream, outside of a single megavendor or cloud provider D&A stack. This
means that users will not have a seamless experience to access data.
Moreover, most A&BI semantic layer platforms provide folders and hierarchies,
but don’t provide much searchability. In contrast, data integration and
cataloging products provide searchability, but limited grouping.

■ Integrateability: Semantic layers should allow for the integration of different BI


tools, data stores and file formats.

■ Why semantic layers don’t live up to this ideal: For the most part, A&BI tools
have not built integrations to connect to other A&BI tool semantic layers,
although some exceptions exist. Increasingly, A&BI platforms are opening
access to their own semantic layers, for use by other BI tools. For example,
Incorta and MicroStrategy have opened up access to their semantic layer so
that Microsoft Power BI and Tableau can connect to datasets inside those
platforms. Microsoft Power BI has done something similar to enable tools that
offer a data connector to analysis services to access data hosted in Power BI.
However, this type of integration is typically developed to enhance product
stickiness, with vendors encouraging customers to use the vendors platform to
build the semantic layer. Moreover, different file formats continue to be difficult
to integrate seamlessly. Tools that specialize in this type of integration, can
provide wide integration capabilities, but they often require opening up a
separate tool with a separate workflow, reducing the previous capability of ease
of access and use.

Gartner, Inc. | G00749457 Page 33 of 36

This research note is restricted to the personal use of [email protected].


■ Development efficiency: Semantic layers must be efficient and easy enough to
develop for the technical professional. In particular, they should offer data modeling
capabilities that can work with a variety of sources. Today, data is stored in many
formats. If it is in a relational store, it will be stored differently than data in XML, in
JSON format or in an unstructured format. A semantic layer should be able to
connect to these various layers and allow you to build abstractions, definitions,
metrics and measures on top of them, to make data more accessible and
standardized.

■ Why semantic layers don’t live up to this ideal: Most semantic layers don’t
intelligently detect the format of data, unless they are data preparation,
wrangling and/or integration tools, and as a result, the technical professional
will often have to manually prepare and model data inside either a data
preparation platform, the virtualization tool, the data warehouse, the data mart
or the A&BI tool. Some platforms offer some capabilities to use AI to enhance
the data preparation and data modeling experience, but they are typically built
on highly proprietary data architectures.

■ Platform efficiency: Semantic layers must deliver sufficient platform efficiency to


enable the servicing of many concurrent users and applications. Moreover, the data
should be flexible and connectible. For example, a limitation of a traditional
relational database is that it is difficult to create new connections between tables.
Certainly, data vault approaches exist, which can ameliorate this somewhat, but in
general, platform efficiency still involves performance versus changeability trade-
offs.

■ Why semantic layers don’t live up to this ideal: Every investment in ease of use
as well as development efficiency typically pushes semantic layer products
more toward the end user, toward in-memory environments, with many
optimizations to make data accessible and usable. But to scale, semantic
layers need to be built to cluster, to perform and to support many concurrent
users. Data lake capabilities can help with scale, and data warehouse and data
mart capabilities can help with performance, but developing semantic layers on
top of data lakes and data warehouses can be highly labor-intensive and
involve many other tools. Graph databases potentially offer both easy-to-model
data as well as high performance, but they are still maturing, and have not
been deployed for this use case by many organizations outside of the data lake
space.

Gartner, Inc. | G00749457 Page 34 of 36

This research note is restricted to the personal use of [email protected].


■ Data security efficiency: One of the big problems with semantic layers today, when
they are built outside of A&BI platforms, is that identity and access management,
defined at the source layer, does not always make it through into the BI tool. A good
semantic layer should be able to inherit authentication, authorization and access
controls so that users’ identities are tied to their data permissions inside the
semantic layer. Moreover, data-at-rest and in-motion security, as well as row-, role-
and column-based security are important to enable organizations with granular
security controls.

■ Why semantic layers don’t live up to this ideal: Many Gartner clients have
questions about why their preferred A&BI vendor has not built native single
sign-on (SSO) connectivity to their preferred data source. Often, the complexity
relates to the preferred application or database residing on a different cloud
platform or as a SaaS offering. In reality, building these connections takes
significant work, and vendors will only build the data security capabilities that
customers are consistently clamoring for in large numbers. Often,
megavendors also have a desire, not just to sell the A&BI platform, but to have
that A&BI platform drive adoption of their cloud platform. This will
disincentivize their efforts to develop deep integrations with data hosted with
different vendors, cloud providers or platforms that don’t serve their strategic
vendor goals.

■ Data governance: Data that flows into a semantic layer needs to be tracked, in
particular for changes. Having lineage as to who is accessing data, when and how
they are using it, as well as lineage as to where data came from and whether
changes have taken place, are important. Data governance also applies to the
overall semantic layer platform, whether it can be controlled, while giving users some
freedom to make changes, and having requirements in some cases to go through a
promotion and certification process.

■ Why semantic layers don’t live up to this ideal: In general, the lack of integration
between semantic layer platforms and their target A&BI capabilities inhibits
sophisticated data governance across platforms. Moreover, in many cases,
functions are built, but because they require heavy-handed IT control, they don’t
necessarily engender trust by business users. Data governance capabilities
tend to conflict with ease of use and development-efficiency-focused
capabilities, leading to platforms that must uneasily balance these conflicting
goals.

Gartner, Inc. | G00749457 Page 35 of 36

This research note is restricted to the personal use of [email protected].


As a result, there are no perfect semantic layer solutions, only different sets of trade-offs,
and many of these trade-offs are based on the technologies that are used to host the
semantic layer. Technical professionals should use this comparison as a starting point for
their semantic layer options comparison.

Recommended by the Author


Some documents may not be available as part of your current Gartner subscription.

Solution Path for Modernizing Analytic Architectures


Assessing the Relevance of Data Virtualization in Modern Data Architectures

Demystifying the Data Fabric

Solution Comparison for 7 Data Fabric Offerings


Demystifying the Analytics and BI Space

Assessing the Capabilities of Data Warehouse Automation (DWA)


6 Things to Get Right for the Logical Data Warehouse

The Practical Logical Data Warehouse

© 2021 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of
Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form
without Gartner's prior written permission. It consists of the opinions of Gartner's research
organization, which should not be construed as statements of fact. While the information contained in
this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties
as to the accuracy, completeness or adequacy of such information. Although Gartner research may
address legal and financial issues, Gartner does not provide legal or investment advice and its research
should not be construed or used as such. Your access and use of this publication are governed by
Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its
research is produced independently by its research organization without input or influence from any
third party. For further information, see "Guiding Principles on Independence and Objectivity."

Gartner, Inc. | G00749457 Page 36 of 36

This research note is restricted to the personal use of [email protected].


Table 1: Comparing Semantic Layer Options

Data
Data Data Lake
A&BI Semantic Data Warehouse Fabric/Graph
Virtualization Enablement/Semantic
Layer Platform Data Store
Platform Layer Platform
Platform

Sample Vendors Google (Looker) AtScale, TIBCO IBM, Teradata, Amazon Cambridge Semantics, AtScale, Kyligence,
Microsoft, SAP, Sisense, Software, Data Web Services, Microsoft CluedIn, Denodo, Kyvos Insights, Dremio,
IBM, MicroStrategy, Virtuality, Denodo, Azure, Google, Oracle, Informatica, Semantic Apache Druid,
Oracle. Zetaris, SAP, Snowflake. Web Company- Clickhouse, Apache
Microsoft, Amazon Web data.world, Stardog, Pinot.
Services (AWS). Talend.

Gartner, Inc. | G00749457 Page 1A of 12A

This research note is restricted to the personal use of [email protected].


Data
Data Data Lake
A&BI Semantic Data Warehouse Fabric/Graph
Virtualization Enablement/Semantic
Layer Platform Data Store
Platform Layer Platform
Platform

Connectivity, Source and Strong source support; Strong source support Medium source support; Medium to strong Source support is strong
Target Support weak target support. and middling target they tend to serve as a source support, if the tool uses
support, as data source, and external depending on whether virtualization, more
virtualization platforms source capabilities data must be loaded limited if built to serve as
have been traditionally provided via query into the platform. Weak a query acceleration
strong for data federation/virtualization target support, as few layer or data mart;
integration use cases , just not many target applications have target support medium
and are improving for transformations built strong connectors given the small size of
semantic layer use supported, and weak to with graph stores. many of these vendors
cases. medium source SLA and the lack of rich
support; strongest target support that has
target support, given the been built.
long popularity of
connecting tools to DW
platforms.

Gartner, Inc. | G00749457 Page 2A of 12A

This research note is restricted to the personal use of [email protected].


Data
Data Data Lake
A&BI Semantic Data Warehouse Fabric/Graph
Virtualization Enablement/Semantic
Layer Platform Data Store
Platform Layer Platform
Platform

Model Development and Legacy: sophisticated Simple, but less Robust, sophisticated Model development is in Modern: drag and drop
Sharing models, complex and sophisticated models. model development some ways simpler, as features, sharing
daunting interface, Easy model sharing and features, sharing network models in graph features. Legacy, a bit
often sprawling. collaborative features vary. theory are “whiteboard more dated model
Modern, simpler development features. ready”; however, design.
interface, less developing models can
sophisticated models be complex as business-
unless you are good at user capabilities are still
adding custom logic. maturing.

Business Constructs, Legacy: Strong Medium support for Strong support for Weak support for typical Construct and
Calculations, and construct and constructs, functions, constructs and business constructs and calculation support
Function Support calculation support and calculations, not calculations, but not calculations natively, as depends on how good a
potential once you superfriendly for superfriendly for a large amount of developer you are; often
master the tool. business users. business users translations/adaptation fewer of them supported
Modern: fewer into the graph query than competitor
constructs and language is required. offerings.
calculations natively However, graphs enable
supported. other unique
calculations such as link
analysis.

Gartner, Inc. | G00749457 Page 3A of 12A

This research note is restricted to the personal use of [email protected].


Data
Data Data Lake
A&BI Semantic Data Warehouse Fabric/Graph
Virtualization Enablement/Semantic
Layer Platform Data Store
Platform Layer Platform
Platform

Query Performance Legacy: Fewer query Lots of tools for Natively powerful query If optimized correctly, Query optimization
optimization tricks, but optimizing query performance; however, blazingly fast query tends to focus on
good query support for performance caching, sometimes challenging performance is possible whatever the strengths
data in preferred BI pushdown processing, to take advantage of for semantic layer use of the tool are, if a query
server. Modern: same. query rewrite and this performance if not cases. However, correct acceleration platform,
substitution, MPP using a target that can optimization takes a using the in memory
support. make full use of significant skill set. layer, if a DV tool, similar
underlying platform to the DV column.
power and optimizer.

User Persona Support Development: Legacy: BI Development: Data Development: DBAs, a Development: Typically Development: Some
developers. engineers, data few report developers. the biggest challenge is power users, some
Modern: Power users, BI integration Production: Challenging finding developers who report developers. Some
developers. professionals, a few to customize for end are comfortable with tools: relatively easy to
Production: lots of power users. users, skilled technical graph languages. use for drag- and-drop
consumers, some LOB Production: lots of professionals typically Production: intuitive development of
specialists like finance. consumers, some data required. model creation; however, semantic layers. Others
scientists. big learning curve to are completely
understand how the technically challenging.
models work.

Gartner, Inc. | G00749457 Page 4A of 12A

This research note is restricted to the personal use of [email protected].


Data
Data Data Lake
A&BI Semantic Data Warehouse Fabric/Graph
Virtualization Enablement/Semantic
Layer Platform Data Store
Platform Layer Platform
Platform

Security and Typically, A&BI- tier- Virtualized security and Strong place to define Theoretically a very Security and governance
Governance focused security model, governance model; security, governance powerful capability for features are somewhat
this requiring great potential but features are solid, but security is embedded in limited to inside the
duplication of efforts. limited by sources and the platform is not graph analytics, as platform.
targets that support business-user-friendly. triples enable interesting
security and governance attribute level security
capabilities. implications with little
overhead. In practice,
few developers have the
skills to do this.

Source: Gartner (September 2021)

Gartner, Inc. | G00749457 Page 5A of 12A

This research note is restricted to the personal use of [email protected].


Table 2: User Persona Outcomes and Feature Expectations

Consumer Explorer Innovator Expert

Desired Outcome View analytic content Select from available fields in Mashup multiple certified Introduce data from
periodically; use it to make a semantic layer to seek out data sources, query against completely new data sources,
data-driven decisions. diagnostic analytics; discover large datasets, create novel create data transformation
the answers to “why” visualizations, generate new scripts, and use advanced
questions. insights on data that may analytics and ML to build
have an impact on the transformational analytics.
organization’s future.

Augmented Analytics feature Natural language text and Natural language processing Automatic data profiling and Rich transformation and
voice query, and (NLP), SQL generation, data classification join query language functions
autogenerated insights. automatic visualization recommendations, visual available, and deep
generation and jargon-free lineage and impact analysis capabilities with R, Python,
ML services that provide for data changes, and menu- PMML, and augmented ML
insights such as key drivers driven advanced analytics available to increase time to
analysis. functions. production for advanced
analytics.

Gartner, Inc. | G00749457 Page 6A of 12A

This research note is restricted to the personal use of [email protected].


Consumer Explorer Innovator Expert

Usability Features Analytic visualizations should In addition to consumer In addition to explorer In addition to explorer and
be searchable, prepopulated features, data should be features, sophisticated data innovator features, advanced
and customized to the users’ organized and linked with rich preparation capabilities with data source ingestion and/or
needs. Definitions of metrics lineage and metadata, with embedded forecasting, connectivity, configuration
and measures are easily the ability to open data in classification and clustering management and monitoring,
accessible and linked to analytic tools that offer drag- functions should be available. and interfaces to and from
dashboard objects. and-drop visualization and data science, machine
analytics capabilities. learning (DSML) and
augmented ML.

Business Workflow Affinity Aligned to information portal Aligned to analytics Aligned to data science hub, Aligned to artificial
capabilities, and embedded in workbench capabilities, with the ability to create intelligence hub, specifically,
business applications, not including the ability to explore features by enriching, joining by automating and
just inside the BI tool, with and mash up data to deliver external sources with augmenting key portions of
powerful mobile app-based new insights, thus, integrated semantic layer data. A analytic processes. A&BI
data access. Reporting into A&BI tools. feedback loop exists to allow functions are programmable,
capabilities and linkage to more sophisticated users to automatable, repeatable,
productivity applications like enrich semantic layer with reusable, and integrated into
Excel important. more metadata or update the experts’ preferred open
fields. source or packaged toolchain
for DSML and AI.

Gartner, Inc. | G00749457 Page 7A of 12A

This research note is restricted to the personal use of [email protected].


Consumer Explorer Innovator Expert

Security and Governance Guardrails have been set that In addition to consumer In addition to consumer and In addition to innovator
Features ensure consumers can’t capabilities, data catalog explorer capabilities, when capabilities, experts are
access data that they offers a view of what data creating datasets, innovators trained in data governance,
shouldn’t. exists, but actually accessing can apply row-level security, so that they can enforce
this data requires explorers to or take advantage of SSO to a governance rules that exist.
follow the request and trusted, centrally managed Additionally, they have a very
approval process. data source. granular ability to secure,
monitor, and measure data
analytics capacity, usage and
performance.

Source: Gartner (September 2021)

Gartner, Inc. | G00749457 Page 8A of 12A

This research note is restricted to the personal use of [email protected].


Table 3: Semantic Layers and Analytics Styles

Decentralized Analytics Centralized Analytics Federated Analytics

Semantic Layer Priority Ease of use, user enablement Control, governance, consistency A balance of agility and control

Gartner, Inc. | G00749457 Page 9A of 12A

This research note is restricted to the personal use of [email protected].


Decentralized Analytics Centralized Analytics Federated Analytics

Semantic Layer Technology ■ Local semantic layer: Modern A&BI ■ Local semantic layer: Query- ■ Local semantic layer: Depending
Approach platforms that offer maximum focused A&BI platforms that offer on the use case, a combination of
usability and power to end users access to data warehouse data for approaches employed, including
for semantic model creation model access and temporary import mode into an A&BI tool,
tables for creation and direct query to the data
■ Global semantic layer: Data lake
customization warehouse, data preparation
enablement platforms which offer
workflows in a data virtualization
significant query and semantic ■ For global access to data: Data
or data prep tool, or data that is
layer capabilities for a broad set of warehouse and DW automation,
made accessible via a lake
users which can populate data marts
enablement platform
that are centrally created and
provisioned ■ Global semantic layer: The LDW,
i.e., a combination of data
warehouse views, data lake shared
packages of data, and data
virtualization for access to data
that isn’t in the first two
categories; for mature use cases, a
data fabric, which adds metadata-
driven recommendations and
AI/ML insights

Gartner, Inc. | G00749457 Page 10A of 12A

This research note is restricted to the personal use of [email protected].


Decentralized Analytics Centralized Analytics Federated Analytics

Strengths Easiest to way to enable access to Easiest way to govern data access; Flexibility, as the advanced level LDW
new data sources cheaply organizations have built up a is in a constant, dynamic state of
semantic virtual tier and minimized change; these changes occur in
the uncontrolled proliferation of data response to changes in business
marts analytics requirements, moving D&A
workloads among the analytics
engines in the stack; LDW contains
every event, transaction, interaction,
sensor reading, customer, employee
and supplier — any and every entity

Weaknesses Data quality issues are likely to arise ■ So far, only partial success, due to The LDW model, if insufficiently
with siloed A&BI adoption, and data low adoption, dissatisfied end supported by resources for
lake initiatives often fail to reach users and high costs implementation and management,
maturity because of the difficult of combines the complexity and chaos
■ Development processes are rigid
data governance of decentralization with the cost and
and time-consuming
slowness of centralization; risk of
■ Badly designed batch data failure due to complex
movement, limits agility and implementation increases with the
delivery efficiency ambitiousness of this project
■ New types of data such as IoT,
social media, weblogs and
geospatial are not supported in
existing architecture

Gartner, Inc. | G00749457 Page 11A of 12A

This research note is restricted to the personal use of [email protected].


Decentralized Analytics Centralized Analytics Federated Analytics

Source: Gartner (September 2021)

Gartner, Inc. | G00749457 Page 12A of 12A

This research note is restricted to the personal use of [email protected].

You might also like