0% found this document useful (0 votes)
204 views

K2View Data Fabric Technical Whitepaper

This document introduces K2View Data Fabric, a distributed data management solution for big data. It discusses challenges with traditional databases like scalability and single points of failure. While newer big data systems address these, they still require migrating existing data which is complex. K2View Data Fabric aims to integrate easily with existing systems using standard connectors. Unlike other solutions, it also models data as digital entities to store it in a way that matches business needs for both analytics and operational use cases. This provides better performance than scanning large data placeholders.

Uploaded by

1977am
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views

K2View Data Fabric Technical Whitepaper

This document introduces K2View Data Fabric, a distributed data management solution for big data. It discusses challenges with traditional databases like scalability and single points of failure. While newer big data systems address these, they still require migrating existing data which is complex. K2View Data Fabric aims to integrate easily with existing systems using standard connectors. Unlike other solutions, it also models data as digital entities to store it in a way that matches business needs for both analytics and operational use cases. This provides better performance than scanning large data placeholders.

Uploaded by

1977am
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

K2View Data Fabric

Technical Whitepaper

TABLE OF CONTENTS

Introduction 3
At Our Core: The Logical Unit 4
Architecture 6
Data Model: K2view Fabric Schema 9
Data Management 10
Data Services 12
Consistency, Durability & Availability 13
Performance 13
Security 15
Administration 17
Total Cost Of Ownership 17
Conclusion 18
INTRODUCTION

THE BIG DATA PROBLEM they require very expensive hardware that is extremely
hard to administrate and are subject to downtimes.
Big Data. The term is sometimes overused as a catchall
for anything that requires massive amount of data For these reasons, modern computer science designed
processing or storage. The academic definition refers to a new fully distributed architectures. These architectures
collection of data too massive to be handled efficiently by provide linear scalability (the system’s capacity increase
traditional database tools and methods. linearly with the number of nodes/computers in the
system) and no SPoF.
Why would traditional database tools and methods not be
able to support modern collections of data? With the amount of data to manage exponentially
increasing and the new solutions available, the current
First, let’s understand how fast data volumes are
software industry is finally looking to replace their
increasing. The graph below presents the evolution of the
traditional RDBMS with these architectures.
global internet traffic, according to Cisco’s analysis¹:
As this new market opens, many solutions arise, providing
fully distributed and scalable systems to manage big data.
These solutions do answer the volume and administration
problems but still present some caveats:

• While designed to support offline business intelligence


and analytics, they are not suitable for operational use
cases, where split-second performance is critical.
• They often require a lot of effort to integrate into an
existing mature environment
• They still use the same type of outdated data
representation that is the relational database model.

BIG DATA INTEGRATION ISSUES


A typical IT ecosystem contains hundreds of applications
using hundreds of fragmented RDBMS. Integrating
a fully distributed big data management system into
a pre-existing environment does not come without
hurdles, however. Indeed, these software suites often
use completely new interfaces (NoSQL, json, etc.) that
require heavy development on the application side to
This exponential growth of traffic directly relates to the
be used as the main data management layer. Moreover,
amount of data that every application has to manage:
even if the hundreds of applications get adapted to the
data management systems on which these applications
new data management system, the data contained in
reside must be designed to handle these volumes.
their respective RDBMS must be migrated to the new
Oracle V2, the first commercial RDBMS (Relational system. A migration from a traditional RDBMS to a big
DataBase Management System) was released in 1979. At data architecture is not only complex, costly and time
that time, the common personal computer was the IBM consuming, it comes with significant risks including data
5120, a PC with 32K of RAM and no storage other than loss and even potentially revenue impacts. Thus, these
two 8-inch floppy disks (2.4 MB of disk space). Of course, fully distributed, highly scalable systems are used not to
RDBMS have evolved since then but despite trying to replace existing applications but as different applications
modernize their architecture (master-slave, active-passive, (e.g. data analytics). The result from the company’s
etc.), this architecture still relies on scaling up their perspective is yet another expensive system on top the
servers (i.e. buying more powerful individual servers) and hundreds of pre-existing systems, while the original data
has a single point of failure (SPoF). Thus, the traditional management issue remains unsolved.
RDBMS architecture make RDBMS inherently expensive
solutions when dealing with massive amount of data:

¹ More information:
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ visual-
networking-index-vni/VNI_Hyperconnectivity_WP.html
DISTRIBUTED RELATIONAL DATA MODEL, but can be extremely costly (in the case of a full in-
memory database especially). One question arises from
PERFORMANCE, AND COST
this analysis: why isn’t the data stored in a way that is
As discussed in the previous paragraph, big data logical to the business application needs?
management systems are extremely hard to integrate as
the main data management system. But even when they WHAT IS K2VIEW DATA FABRIC?
are integrated or used as fringe systems, they still use the At K2View, we answered that question and our customers’
same technical approach to represent this data: data is Big Data Problem with our flagship product: K2View Data
stored by category in extremely large placeholders that Fabric. K2View Data Fabric provides all the benefits of
get linked to one another. a big data management system: a distributed, shared-
Some big data systems use files or documents but nothing and linearly scalable architecture, massive
the most common and easy to use way to store and parallel processing in memory for computation, and disk
manipulate data remains tables, just as a traditional storage for minimal total cost of ownership. But K2View
RDBMS. Data Fabric also solves the two major flaws of the other
modern big data management systems:
Therefore, when an application queries the big data
management system, it goes through the same type of It requires almost no effort of integration into a mature
lookup as any RDBMS: scanning through these massive environment thanks to its embedded ETL layer, and full
data placeholders to retrieve one piece of information SQL and standard connectors support.
that will link them to another placeholder, and so on and It uses a revolutionary and patented means to represent
so forth until you reach the line of data that is relevant to and store the data the way that the business needs
the application. it - the Digital Entity™ - making it uniquely suitable for
These lookups are extremely resource consuming operational use cases, where near real-time performance
because they require massive amounts of computation. is mandatory.
The way that big data systems try to overcome their K2View Data Fabric offers additional features for better
computation constraints is by using massive parallel security, flexible synchronization and easy administration.
processing or executing them in memory, going as far as The object of this white paper is to detail the K2View Data
storing the entire database in memory. These methods Fabric architecture and key capabilities.
do provide better performance than traditional RDBMS

AT OUR CORE: THE DIGITAL ENTITY


DEFINITION using K2View Data Fabric’s Auto-Discovery module, or
performed manually using the K2View graphical Studio.
As explained above, most database management
The result is a business-oriented structure containing
systems store data based on the type of data being
tables and objects from as many systems as needed
stored (e.g. customer data, financial data, address data,
(e.g. for a Customer digital entity, 3-tables from the CRM
device data); this model translates into very large tables
system running on MySQL and 5-tables from the billing
that must be queried using complex joins every time one
system residing on Oracle).
wants to access business-relevant data (e.g. how many
payments has this customer made within the past three This schema is used every time data is accessed in
months?). K2View Data Fabric: using embedded ETL capabilities,
the data is processed, stored, and distributed into a
K2View Data Fabric looks at data a different way: storing
micro-DB™ – one micro-DB per digital entity instance.
and retrieving it based on business logic. This allows the
business organize data based on their needs, as opposed Every micro-DB is compressed and individually encrypted
to try to fit them into a pre-defined structure. with a unique key, enabling incredible performance,
enhanced security, high-availability and configurable data
Indeed, in K2View Data Fabric, every business-related
synchronization.
object (e.g. Customer, Merchant, Location, Credit Card) is
represented by a data schema - the Digital Entity™.
This schema defines all the relevant data fields,
aggregated from all underlying systems, that is
associated with the digital entity. Defining the data
schema for the digital entity is either automated
WHY IS REPRESENTING DATA AS DIGITAL enables K2View Data Fabric to restrict access and
encrypt data at the micro-DB level. Moreover, it allows
ENTITIES SO IMPORTANT?
for fully flexible data retrieval without interruption of the
The digital entity and associated micro-DB concept is a underlying application systems: data can be retrieved
bridge between scattered, hard to maintain data and high in real time, from in memory when queried instead of
availability, business- oriented data. having to retrieve all the data from all the massive tables
By its very nature, it enables split-second performance. before being able to access that data. Therefore, when
Indeed, since the data is organized according to the describing K2View Data Fabric capabilities, we will heavily
business needs, around 95% of all data access will occur rely on this concept.
within a single micro-DB: this means that inherently every The diagram below illustrates the difference between
K2View Data Fabric query will execute against only one traditional data representation and K2View Data Fabric
micro-DB (for a single business-related object) instead digital entity concept.
of having to scan through massive tables like other
databases.
And since the data is stored in micro-DBs, it also

TRADITIONAL RDBMS BIG DATA ARCHITECTURE K2VIEW DATA FABRIC

• Data is scattered across • Data is stored and distributed • Data is stored and distributed
multiple systems (e.g. CRM, in big data system (e.g. in K2View Data Fabric
Billing, etc.) Hadoop) • Data is represented by
• Each system is independent • No business logic in storage digital entity and stored in a
• Very hard to access all • Data access for one entity micro-DB, one micro-DB per
data corresponding to one requires lookup through business-related object.
business-related object (e.g. massive amount of data • Data access for one entity is a
customer) core feature
ARCHITECTURE

The diagram below illustrates an overview of the K2View Fabric’s architecture:

INSIDE K2VIEW FABRIC

• CONFIGURATION: This layer contains the versioned time synchronization of data to K2View Fabric. It is
configuration of every digital entity. It contains all described in detail in the DATA MANAGEMENT section.
its parameters: schema, interfaces, optimization • ETL LAYER: This layer is K2View Fabric’s embedded
parameters, synchronization policies, transformation migration layer, allowing for automated ETL on retrieval.
rules, masking rules, user functions, web services, etc. It is described in detail in the DATA MANAGEMENT
This layer is accessed through our administration tools section.
(K2 Admin Manager, K2View Studio and Web Admin)
and is described in detail in the DATA MODEL: K2VIEW
• ENCRYPTION ENGINE: This layer manages the
granular encryption of each data set. It is detailed in the
DATA FABRIC SCHEMA section. SECURITY section.
• MICROSERVICES: This layer is used to communicate
• LU STORAGE MANAGER: This layer compresses and
with user applications: either via direct queries sends data to the distributed database for storage.
(database services) or via web services. It is described K2View Data Fabric leverages Cassandra as the
in detail in the DATA SERVICES section. distributed storage. More details about this layer and
• AUTHENTICATION ENGINE: This layer manages user the communication between K2View Data Fabric and
access control and restrictions. It is detailed in the Cassandra will be detailed later in this section.
SECURITY section.
• MASKING LAYER: This layer is an optional layer
that allows real-time masking of sensitive data. It is
described in detail in the DATA MANAGEMENT section.
• PROCESSING ENGINE: This layer is where every data
computation is managed. It uses the principles of
massive parallel processing and map-reduce in order
execute operations. It is described in detail in the
PERFORMANCE section.
• SMART DATA CONTROLLER: This layer drives the real
OUTSIDE K2VIEW FABRIC

• USER APPLICATIONS: These are the clients and • DISTRIBUTED ARCHITECTURE: Cassandra brings
different application using the data. It can be any type together the distributed systems technologies from
of application (java-based, web-based, etc.) or a simple Dynamo (Amazon’s Highly Available Key-value Store)
client or script querying K2View Data Fabric. and the data model from Google’s Big Table. This
• SOURCE SYSTEMS: These are the current data makes K2View Data Fabric one of the world’s most
management systems (traditional RDBMS or big data efficiently distributed, shared-nothing, no Single Point of
management systems). K2View Data Fabric integrates Failure (no SPoF) backend architectures.
data from and updates to these source systems but • LINEAR SCALABILITY: The linear scalability of
also allows gradual retirement of deprecated legacy Cassandra has been thoroughly demonstrated, most
systems to become the new operational data system famously by Netflix that studied and demonstrated its
that replaces the source system. capacities on stage. K2View Data Fabric is therefore
• DISTRIBUTED DATABASE: This layer manages the a linearly scalable product running on commodity
distribution and storage of data. K2View Data Fabric hardware to ensure the lowest Total Cost of Ownership
leverages Cassandra as distributed storage, but can be (see the TOTAL COST OF OWNERSHIP section).
adapted for any other type of distributed data storage.
COMMUNICATION WITH
DISTRIBUTED DATABASE FEATURES DISTRIBUTED DATABASE
• K2View Data Fabric utilizes Cassandra as a base • Internally, the communication between K2View Data
storage layer to handle the mundane data storage and Fabric and the distributed database is very simple. It is
access functionality. This section details the features driven by three components of the K2View Data Fabric
enabling this communication: Architecture in three distinct cases:
• KEY-VALUE STORE: Every digital entity’s data is stored • CASE A: The Smart Data Controller needs to retrieve
in a micro-DB, each with its unique key, using the data for transactions on single micro-DB
distributed database native functionalities. For K2View • CASE B: The Smart Data Controller needs to retrieve
Data Fabric, the key is Logical Unit Instance ID, as data for analytics transactions across multiple micro-
defined in its schema (see following section DATA DBs
MODEL: K2VIEW DATA FABRIC SCHEMA); the value is
a compressed and encrypted database file containing
• CASE C: The LU Storage Manager needs to push data
onto disk after compression
the micro-DB data. This gives K2View Data Fabric a
very simple, structured and efficient way to access
distributed data.
• COMMUNICATION PROTOCOLS: K2View Data
Fabric relies on the distributed database native
communication between server and clients. This
means that K2View Data Fabric support Cassandra’s
SDK out of the box (see the DATA SERVICES section for
more details).

CASE A
In case A, the Smart Data Controller checks if
synchronization with other systems than K2View Data
Fabric is needed; if not, it triggers the retrieval of data
from the distributed database storage for a transaction
on one single micro-DB (for details when synchronization
is needed, see the DATA MANAGEMENT section). For this
type of transaction, K2View Data Fabric simply retrieves
the value associated with the corresponding key (Logical
Unit Instance ID). This retrieval is executed using the
distributed database native capabilities.
CASE B
Case B is the case of large queries over several micro-
DBs. In this case, the Processing Engine triggers K2View
Data Fabric to send workers distributed across the
distributed database nodes. For this type of transaction,
each worker in K2View Data Fabric triggers the Smart
Data Controller to retrieve the most up to date data. If
synchronization with other systems is not needed, the
Smart Data Controller retrieves the most up-to-date
data from the distributed database using its native
communication protocols and sends it to the worker that
then executes its individual computation on the node to
which it has been dispatched. For more information about
this process, refer to the PERFORMANCE section.

CASE C
Case C is the last interaction between K2View Data
Fabric and the distributed storage: when data is pushed
from memory to disk. Every time data is accessed in
memory for computation, it is pushed post-computation
onto the distributed database key-value storage. As
explained previously, the key is the Logical Unit Instance
ID and the value is a compressed database file containing
the micro-DB data. The LU Storage is responsible for
compressing the database file post encryption and sends
it for storage on the distributed database using its native
communication protocols. More details about a data
access flow can be found in the DATA MANAGEMENT
section.

As detailed above, the only interactions between K2View By only relying on Cassandra as a distributed storage
Data Fabric and Cassandra are: layer, K2View Data Fabric is very flexible and may be

• Pulling data from nodes using the distributed database


adapted to other distributed databases.
native protocols
• Pushing data to nodes using the distributed database
native protocols
DATA MANAGEMENT
OVERVIEW Fabric or remain operational until desired retirement. This
no-configuration flexible ETL makes integrating K2View
Using the digital entity schema definition, K2View Data
Data Fabric in an existing IT eco- system a risk free
Fabric features a set of embedded data management
alternative.
functions to populate, store and present data:

• Embedded ETL (Extract-Transform-Load) EMBEDDED DATA MASKING


• Embedded data masking While integrating K2View Data Fabric into an IT eco-
• Flexible data synchronization system, it can be an interface to different type of
applications: staging applications dedicated to internal
These data management features are key differentiators
between K2View Data Fabric and other data management business testing or production applications for end-user
systems, even modern distributed systems. They allow consumption. These applications often contain highly
data to be retrieved, validated and enriched automatically sensitive data (e.g. SSN, Credit Card, etc.). This data must
- without using any transformation scripts or third-party only be accessed by the relevant users and only be stored
tools. In traditional data management platforms, data where absolutely necessary: on the production system,
management is cumbersome, risky, and expensive. In where it is the most secure. Other systems should store
K2View Data Fabric, data management is part of the masked data.
database, as this section will describe. To solve this issue, K2View Data Fabric provides a fully
embedded suite of data masking functions. As in the
EMBEDDED ETL embedded ETL layer, all masking is based on the digital
Traditional ETL solutions used for data movement to entity data representation.
either traditional RDBMS or even new big data stores
K2View Data Fabric embedded masking features:
are not designed to be distributed. That is why K2View’s
ETL capabilities have been embedded into K2View Data • In-memory digital entity-based masking
Fabric. The principles of the ETL are based on the digital • No-interruption, on-the-fly masking
entity data representation: once the schema has been • User role dependent masking
defined, K2View Fabric workers launch the retrieval and
• Full data consistency according to schema
transformation of data through the supported connectors.
• Conservation of every application rules while masking
When the K2View Data Fabric ETL layer is activated, it for data complete usability
triggers a ‘mini-migration’ from the source interfaces
defined in the schema onto K2View Data Fabric for the
• Full suite of pre-configured libraries masking for most
commonly used fields
micro-DB in scope. This ETL is executed in memory
and parallelized using K2View Data Fabric’s processing
• Offline debug capabilities

engine. Once retrieved, the data can be used in memory


for any operation. After this data is used, it is pushed onto Of course, data masking is an optional component of
the distributed database layer. K2View Data Fabric and does not need to be executed for
production use.
There is no configuration needed for the ETL layer, as
any enrichment or validation is simply defined while While these embedded functionalities are essential to
populating the K2View Data Fabric schema via the K2View Data Fabric’s data management features, they
K2View Studio. are driven by K2View Data Fabric’s Smart Data Controller.

As a result, K2View Data Fabric embedded ETL offers the SMART DATA CONTROLLER
following features:
Any time data is accessed in K2View Data Fabric, the
• In-memory digital entity-based migration Smart Data Controller compares the current state of the
• No-interruption; on-the-fly migration data in K2View Data Fabric versus the synchronization
• Full data consistency according to schema parameters - and updates the data if needed (e.g.
difference in version, or other synchronization modes
• Low risk flexible phased or full system migration
triggers).
• Concurrent configuration and versioning
• Full suite of transformation function library for complex
transformation and enrichments
• Offline debug capabilities

The embedded ETL layer can be used for data migration.


Furthermore, it can also be triggered by the K2View Data
Fabric’s Smart Data Controller on data retrieval allowing
for legacy systems to either retire into K2View Data
The most common case is when a change in the version automatically captures changes in the source systems
of the K2View Data Fabric schema occurs (e.g. new table that are part of its schema. This synchronization mode
added, field enriched): in this case, when the Smart Data granularity is at the table level: when a change occurs
Controller is called to access data, it will trigger K2View on a table from a source system, K2View Data Fabric
Data Fabric’s embedded ETL capabilities to connect to triggers the Smart Data Controller to change the data only
the relevant source systems and access that data. on the corresponding table in the digital entity schema,
therefore minimizing irrelevant updates for performance
As such, K2View Data Fabric gives immediate access
optimization.
to data without going through a complex, costly and
time- consuming migration process by simply defining ALWAYSYNC
its schema. Using versioning also allows to enter a
K2View Data Fabric features an intelligent and flexible
“time machine” and view the micto-DB data as it evolves
way to synchronize data: AlwaySync. This mode allows
throughout the different schema versions.
complete granularity over the data that needs to be
The same principles of synchronization apply for every synchronized with source systems.
synchronization mode triggers below.
Using AlwaySync, K2View Data Fabric allows you to
ON-DEMAND SYNC configure what data needs to be refreshed automatically,
and how frequently. For each element of the digital entity
K2View Data Fabric allows data synchronization to be
schema, an AlwaySync timer that drives the K2View
triggered by on-demand calls. These calls are driven by
Data Fabric synchronization is set (e.g., if the usage
the web and database layer. They can either be a web
information from the Customer table needs to be updated
service triggering the Smart Data Controller or batch
every 5 minutes, a timer of 5 minutes is set).
scripts using K2View Data Fabric’s database drivers or
directly querying K2View Data Fabric (administrative Once configured, K2View Data Fabric synchronizes data
mode). with source systems only upon data access - to optimize
performance. As such, when retrieved, if the timestamp
EVENT-BASED SYNC on the micro-DB data is older than the timer set for this
Alternatively, K2View Data Fabric synchronization data, K2View Data Fabric triggers synchronization.
can be triggered using the principles of Change Data The figure below illustrates this synchronization mode, in
Capture (CDC). Using this mode, K2View Data Fabric the case of data access without masking.

1. Web/Database Services layer 4. The Smart Data Controller checks 7. The processing engine uses data
relays data access request from the micro-DB data timestamp to perform necessary operations.
user application. against AlwaySync timer; the 8. Web/Database Services layer
timestamp is older than the
2. User is authenticated and sends processed data to user
AlwaySync timer.
authorized to proceed with data application.
access. 5. The Smart Data Controller triggers 9. In parallel to this process, the
ETL layer for data synchronization.
3. The Smart Data Controller processing engine sends data for
retrieves data from distributed 6. ETL layer synchronizes data with encryption.
database storage in memory (if source systems and sends it to 10. Data is compressed and sent to
not already). the processing engine. distributed database for storage.
DATA SERVICES
OVERVIEW schema for a particular micro-DB) to be an index, thus
enhancing performances of queries selecting all micro-
As described in the previous section, K2View Data Fabric
DBs of a particular state.
uses innovative engines to manage and synchronize data.
K2View Data Fabric also provides easy access to this WEB SERVICES
data via its web and database services. The goal of these
One of K2View Data Fabric’s core and unique capabilities
services is to make the integration of user applications
is its embedded Web Services layer. Indeed, in traditional
seamless.
database management systems (distributed or not),
This is why K2View Data Fabric provides: creating a layer of Web Services entails advanced

• A query engine supporting full SQL.


and intricate software development: you need to
define communication protocols with the database
• An easy-to-configure web service layer.
management system, expose these access methods,
This section will describe these services in detail.
define users and security protocols, define the distribution
DATABASE SERVICES
of the Web Services layer, etc. This development is
The K2View Data Fabric Processing Engine uses two expensive, time-consuming, and requires constant
query maintenance to cater to the database changes or new
Methods, depending on the type of data on which the functional requests.
query is executed: In K2View Data Fabric, the web services layer is

• Query on single micro-DB (around 95% of overall


embedded: it offers an out-of-the-box configuration
graphical interface to define Web Services. Any function
queries): simple ANSI SQL query.
(which can be as simple as a query) can be created and
• Query across micro-DBs for analytics: Leveraging
registered as a Web Service.
integrated Elastic Search engine
These two methods are described in detail in the These functions can then be re-used and combined by
PERFORMANCE section. However, both of them support other functions, essentially allowing for any Web Service
the full capabilities of ANSI SQL. to be easily re-used by other Web Services.

On top of this SQL language support, K2View Data Fabric Once a Web Service function is defined, K2View
is packaged with embedded database drivers: Data Fabric automatically takes care of user access,
distribution, updates due to schema changes, etc. The
• Every driver supported by the Cassandra SDK². gain in time and effort is tremendous.
• Full JDBC support.
Each Web Service can be restricted per micro-DB and per
user. Moreover, as hinted above, indexes can be used to
Finally, on top of the standard indexing functionalities restrict access to a Web Service, making any field of the
provided by its full SQL support, K2View Data Fabric K2View digital entity schema a potential restriction field.
provides a proprietary way to define and utilize indexes in
To reuse the above example, the query selecting all micro-
order to optimize queries and enable user access control.
DB data for a particular state can simply be registered as
Indexes can be defined for any field of the K2View Data a function and thus a Web Service. This Web Service can
Fabric schema. By defining a field as an index, this field then be accessed by any application and the state index
can automatically be used for analytical queries (indexes can be used to restrict the access of one particular state
are not used for single micro-DB queries because they are to one particular set of users.
not needed).
Therefore, and unlike traditional solutions, by combining
Indexes are stored as reference data and can be used to embedded Web Services, ETL, and flexible sync
optimize queries and to define access permissions. capabilities, K2View Data Fabric does not require any
For instance, K2View Data Fabric enables a DBA to custom upstream or downstream development.
define the state field of an address table (contained in the

² More information: http://planetcassandra.org/client-drivers-tools/


CONSISTENCY, DURABILITY & AVAILIBILITY
OVERVIEW highly available), determines if a concurrent transaction
is occurring and if the write should be put on hold. For
K2View Data Fabric uniquely ensures full consistency,
instance, if two or more concurrent transactions are
guaranteed durability, and high availability of the data
committed to the same micro-DB, its transaction log is
it manages. Before detailing how K2View Data Fabric
used as a conflict detection and resolution mechanism.
supports these capabilities, let’s look at their Wikipedia
This process occurs only in the case of a write
definitions:
transaction for a particular key (micro-DB ID), therefore
• CONSISTENCY: Consistency in database systems maintaining fast performance and high availability.
refers to the requirement that any given database
transaction must only change affected data in allowed DURABILITY
ways. Any data written to the database must be valid As mentioned above, durability is inherent to the
according to all defined rules, including constraints, distributed database layer. K2View provides durability
cascades, triggers, and any combination thereof. by appending writes to a commitlog first. This means
• DURABILITY: In database systems, durability is the that before performing any write and using it in memory
it appends the value into a commitlog. This not only
ACID property which guarantees that transactions that
have committed will survive permanently. ensures full durability – because data is written on disk
• HIGH AVAILIBILITY: High availability is a characteristic first – but since it’s appending a small amount of a file, it
is also quasi-instantaneous.
of a system. It is measured in percentage using the
following equation: Ao = (total time - down time) / total
HIGH AVAILIBLITY
time. It relies on three principles: Elimination of SPoF,
Reliable crossover, Detection of failures as they occur. Similarly to durability, high availability is driven by
the distributed database (Cassandra) architecture.
In K2View Data Fabric, durability and high availability Cassandra’s architecture ensures the elimination of
are inherent features of the distributed database layer Single Point of Failures by design. In a Cassandra
(Cassandra). Cassandra has a flexible consistency cluster all nodes are equal: there are no masters or
mechanism, that K2View Data Fabric leverages to be fully coordinators at the cluster level. Moreover, Cassandra
consistent during network partitions, as expected from an provide reliable crossover and detection of failures using
ACID compliant database. an internode communication protocol called Gossip.
Gossip is a peer-to-peer communication protocol in which
CONSISTENCY nodes periodically exchange state information about
Consistency is ensured by the Processing engine of themselves and about other nodes they know about.
K2View Data Fabric. Every time a write on a certain Therefore, Cassandra and thus K2View Data Fabric,
micro-DB occurs, K2View Data Fabric checks against a ensure the three principles of high availability. The next
transaction table stored in the distributed database (thus section will present how this high availability capability
distributed and translates into high-end performance.

PERFORMANCE
OVERVIEW linearly scalable, it can be easily adjusted to cater to any
desired required speed.
K2View Data Fabric’s high performance is rooted its
inherent digital entity and micro-DB architecture, running IN MEMORY PROCESSING
every query on a small amount of data. This makes
This performance enabler is pretty straightforward:
K2View Data Fabric the fastest high-scale database
whenever an operation is executed, the computation
management system available. On top of this inherent
is executed in memory and not on disk – for faster
design, K2View Data Fabric optimizes performance using
performances. While certain queries need to be
the two following major principles:
distributed (in the case of analytic queries), the design
• Every query is executed in memory. of K2View Data Fabric does not require complex
• For analytics queries running across several micro-DBs, parallelization for almost any of its operations.
K2View Data Fabric leverages an integrated Elastic Indeed, provided a proper digital entity schema design,
Search engine to scan and search data in multiple most of the queries are executed against one micro-DB,
micro-DBs. on a limited amount of data.
Therefore, the amount of data to be retrieved and
The following section describes how both principles
processed in memory is small enough to provide
ensure high-end performance. The purpose of the section
extremely fast performance without having to implement
is not to establish performance benchmarks; of course,
complex distribution across nodes, the absence of which
K2View Data Fabric’s actual performance is highly
also contributes to faster performance.
hardware dependent, but since K2View Data Fabric is
SECURITY
OVERVIEW patented security features: Hierarchical Encryption-Key
Schema (HEKS) and user access control.
While the previous sections discuss the performance,
reliability and high availability of data in K2View Data K2View Data Fabric’s proprietary algorithm, Hierarchical
Fabric, one major hurdle of any data management Encryption-Key Schema (HEKS), relies on the resource
system is ensuring that the data is securely stored. The keys contained in one user’s wallet and provide three level
goal is to eliminate mass data breaches. Since K2View of resource keys:
Data Fabric stores data in micro-DBs, it provides two
revolutionary protocols to secure data:
• Master Key: Generated during K2View Data Fabric’s
installation, this is the main key allowing access to
• Advanced data encryption via the patented K2View every resource of K2View Data Fabric: each user that
has access to that key can generate all other key types,
Hierarchical Encryption Key Schema (HEKS)
• Complete user access control using HEKS, data and thus encrypt or decrypt any resource within the
hierarchy.
services authorization parameters and index definitions.
• Type Keys: These keys restrict access at the micro-DB
As presented in the ARCHITECTURE section, these level and are a hash of the Master Key. As such, each
protocols are used in the Authentication Engine and user that has access to a type key can encrypt and
Encryption Engine. The Authentication Engine ensures decrypt data belonging to every digital entity type.
user access control upstream while the Encryption • Instance Keys: These keys restrict access at the
Engine encrypts the data downstream before storage. Logical Unit Instance level and are a hash of their
corresponding type key. As such, a user having
This section will start by addressing the Encryption
access to one Logical Unit Instance will not be able to
Engine before the Authentication Engine, as some of the
decrypt data from another instance, even though both
Authentication Engine relies on the principle of HEKS
instances are part of the same schema.
defined in the Encryption Engine.

ENCRYPTION ENGINE
Each user in K2View Data Fabric is created with a set of

M
as
security attributes that are used to enable K2View Data

ter
Fabric security features, as detailed in the figure below.

Di
K2View Data Fabric uses an industry standard public-key

git
al
cryptography algorithm to encrypt and decrypt data. The

En
tit
user public key is used to encrypt its wallet, which stores

yT
yp
resources keys. This data is then only readable using this

e
user’s private key to decrypt. The private key itself has

M
been previously encrypted using the user password. This
icr
o-D
password is salted before being stored in K2View Data
Fabric, and is never stored in clear. B
Using this cryptography, a user has access to a set of
resource keys stored in their wallet, enabling exclusive,
In the figure above, you can see how HEKS is
implemented for two digital entity types.
Password Indeed, you can see the following keys:
Salted before storage.
• 1 Master Key allowing full access
Private Key (decryption) • 2 Type Keys restricting access to 2 different digital
Unique, encrypted using entity types
password.
• 6 micro-DB keys, 3 for each digital entity types,
Public Key (encryption) restricting access at the micro-DB level
Unique, public to every
user. Using this hierarchical encryption, K2View Data Fabric
enables complete control over the stored data and
Wallet
significantly reduces the risk of data leaks: even if one
Collection of resource
micro-DB key were to be hacked, only the data of one
keys used for HEKS.
micro-DB would be breached; all other micro-DBs would
remain safely encrypted. Therefore, this design makes
K2View Data Fabric an “ultra secure database”, essentially
rendering massive data breaches impossible.
HEKS USER ACESS CONTROL 1. User A uses their private key to decrypt the Instance
Key needed for the grant.
Once defined, HEKS resource keys are used for user
access control, by associating roles to resource keys. 2. Using this Instance Key, K2View Data Fabric allows
The example below details how a user is added to a role, the user to create a new role, giving access to one
giving access to one micro-DB. In this example, user A is micro-DB of that digital entity type.
attributed to two roles, giving him control over two digital 3. Using user B’s public key, user A encrypts the
entity types, and thus possesses two type keys in his generated micro-DB key (hash of the digital entity
wallet; user B has no key in their wallet. type key); this resource key is then added to user’s B
Therefore, the grant flow will be as follows: wallet.

USER ACESS CONTROL BEYOND HEKS Moreover, K2View Data Fabric allows administrators
to restrict access based on indexes defined for each
On top of granting access via associating the appropriate
element of the data within the K2View digital entity
resource key to the right role, K2View Data Fabric offers
schema.
full flexibility over role definitions, allowing it to restrict the
access one user has over a digital entity type or micro-DB: Defining indexes not only serves as indexing for cross
micro-DB queries, it allows for complete granularity in
• At the digital entity type or micro-DB level, enabling read user access control. For example, K2View Data Fabric
or write over its structure. can define a country index associated with every micro-
• At any other level, defining the method (e.g. Web DB. Using this index, a new role can be defined to give
Service function) allowed to access the data. Thus, not access to every micro-DB from a specific country to a
only can one user access be restricted at the micro-DB subset of users.
level, but it can also be restricted to one method only
(e.g. one Web Service reading method).
DATA ORCHESTRATION
OVERVIEW Simple mathematical comparison for all type of data can
be executed and influence the execution of subsequent
K2View Data Fabric features an easy way to orchestrate
stages.
data in and out of its scope, while applying the digital
entity and micro-DB organization and processing Moreover, mathematic operators can be used, such as
capabilities described above. The goal of K2View Data power, root, min, max or mod.
Orchestration is to provide the ability to design data flows RegEX and string manipulations are also supported
as a succession of Stages. Data processing is executed out-of-the-box to prepare and format data according to
across Stages by pre-defined data operators that act business requirements. Many more mathematical and
as the building blocks of the flows and that can be logic functions can be added by creating new custom
assembled and combined to implement any required data operators that can be stored and reused across pipelines
processing. and projects.
K2View Data Orchestration provides:
MASKING OPERATORS
• A flow management staging front-end, built from These operators are used to mask sensitive information,
Stages, which are executed from left to right. A flow such as SSN, credit card number, emails, sequences, zip
can be split into different execution paths based on codes, etc. For example:
conditions and loops according to the configurable
logic path that matches your business process. • SSN masking will mask the original SSN by replacing it
• A comprehensive and growing list of over one hundred with a valid, yet fake SSN.
pre-built operators (functions) for processing data. New • Credit card masking will generate a fake (but valid)
operators can be developed in JavaScript for custom credit card number.
functionality. • Digital entity function masking will mask the input
• In debug mode, data can be visualized and traced along value of the function with the output resulting from the
digital entity function execution. If the masked value is
the different data orchestration flows.
found in the masked values storage, the function will
EXTERNAL DATABASE OPERATORS not be called.
K2View Data Orchestration provides a range of Database
functions to interact with external interfaces such as:
PARSING & STREAMING OPERATORS
Various stream manipulation functions, such as
• JDBC URLs compression, file reading or HTTPS streaming can be
• References to predefined interfaces, Schemas, performed.
• tables or fields in coordination with SQL commands
Data from files stored in a defined interface can be read
and processed line by line and parsed according to the
Data Orchestration also supports loading data into
scheme corresponding to its format: JSON, XML, CSV, or
databases, with regular commands such as insert,
plain text.
update/upsert, and delete, and using appropriate.
HTTPS streaming operators allow the connection to
DATA ORCHESTRATION OPERATORS external web servers, thus providing the ability to enrich
Micro-DBs corresponding to digital entities stored in Data Orchestration flows with data originating from any
K2View Data Fabric can be: accessed, used, transformed, external data sources, such as social media, weather, and
and exposed from the Data Orchestration pipeline finance APIs.
manager. Once a specific operator is invoked, it sets the
scope of the entire pipeline to a specific entity.
MISCELLANEOUS
Many more operators for data maps or array creation,
This is fully configurable so the same pipeline can be
systems monitoring and statistics are available, and can
invoked and executed across multiple entities using batch
be used in tandem with all the operators defined above.
operators.

LOGIC OPERATORS
K2View Data Orchestration boasts an exhaustive list of
built-in logic operators.
These allow to perform logic operations on operators and
return Boolean values depending on the outcome.
TEST DATA MANAGEMENT
OVERVIEW • Testers can save specific versions of a selected list of
entities or selected list of metadata (reference) tables.
K2View Test Data Management (K2View TDM) offers an
automated means for provisioning realistic data subsets • Testers can load a selected version of entities or
metadata tables to the selected target environment.
for digital entities into a test environment. Such data sets
are generated from your production systems and provide • Provisioning of data on-demand or automatic
high-quality data to testing teams. provisioning based on scheduling parameters. For
example, provisioning the data automatically every
K2View TDM relies on the K2View Data Fabric, which acts week.
as (1) a test data warehouse for the provisioned test data,
and (2) an ETL layer for extracting data from production
sources and loading it to the target environment. TDM ARCHITECTURE
One of the main challenges of provisioning test data is
that data is often fragmented between different data
sources. For example, a Customer’s data may be stored
in CRM, Billing, Ordering, Ticketing, Customer Feedback,
and Collection systems. To run functional tests on a
Customer in an integrative testing environment, their data
must be extracted from all relevant source systems while
maintaining referential integrity.
The K2View Data Fabric’s patented micro-DB, a data lake
for each digital entity instance, ensures smooth data
provisioning, based on the company’s business needs
rather than extracting a complete copy of each data
source.

KEY FEATURES
• Self-service web application where testers can request TEST DATA MANAGEMENT GUI ADMIN
data to be provisioned on-demand. The K2View TDM web application offers the testing
• Test data warehouse of provisioned test data. manager the ability to perform the following activities:
• Ability to transfer data into live testing environments.
• Defining business entities, environments, roles, and
• Data subset requests, re-deployment of data and data permissions.
appending.
• Creating and executing K2View TDM tasks that provide
• Provisioning user-defined lists of business entity data a selected subset of entities or reference tables to the
from a selected source environment to a selected selected environment.
target environment. All data related to the selected
entities is extracted and copied to the relevant data K2VIEW TDM DATABASE
systems. This enables the provisioning of a sub-set of K2View TDM settings and tasks are managed in the TDM
entities based on predefined parameters. For example, PostgreSQL DB. Both the K2View TDM GUI and K2View
copying 10 customers in NY and using small business Data Fabric connect to the K2View TDM DB to get or
packages. update settings or tasks.
• Synthetic data generation, by cloning a given
production entity into the target environment, while
avoiding sequence duplication and ensuring referential
integrity in the test environment.
• Automatic data security and masking on an entity-by-
entity basis.
• Updating schemes from selected entities.
K2View TDM features a “Data Flux” that provides the
ability to roll-back test data from a specific version:
DATA PRIVACY MANAGEMENT
OVERVIEW in order for the system to learn the rules, roles and
policies at play in your data compliancy scheme:
K2View Data Privacy Management (K2View DPM)
provides the tools needed to configure, manage, and • Regulations – used to capture the rules pertaining to a
audit Data Subject Access Requests associated with specific Data Privacy policy
data privacy regulations such as GDPR, CCPA, LGPD, and • Activities – used to represent all the actions allowed by
others. customers with regards to their data, as per defined in
K2View DPM is highly configurable to support regulation the legislation (right to be forgotten, right to access or
rules, workflows, and data access requirements modify data)
associated with privacy compliance. • Flows – responsible to execute the request that
corresponds to one of the activities above-defined
K2View DPM is composed of two main components that
ensure the end-to-end lifecycle of any data privacy-related • Stages – the building block of a flow
requests, from request submission to fulfilment, including • Tasks – lowest granularity for one of the many actions
SLA management, dashboards, and data audits: performed in an activity such as:

DATA PRIVACY CONFIGURATION


• Send email to customer
• Gather customer data
Data Privacy policies and rules are managed via the
K2View DPM Admin module, to configure the system.
• Check request validity

Through this module, the administrator can define all data


privacy management aspects:

• Supported regulations
• Types of requests that can be made for each regulation
• Task flows required for the fulfilment of each request,
consent configuration, and more.
• DSAR & CONSENT MANAGEMENT
• This is where role-based schemes that serve different
users, are defined:
• Call center Representatives, who handle data requests,
• Data Stewards, who execute the requests, ROLES
• Case Owners, responsible for the successful K2View DPM is a role-based application. Each user is
completion of the requests under their responsibility associated with one or more roles. The role determines
• Supervisors, who distribute requests to Case Owners. the Activities the user can perform in the system.
The roles are structured in two layers:
K2View DPM covers the end-to-end lifecycle of Consent
management, including:
• DPM Application Roles – each application role includes
a set of DPM functionality a user can perform.
• Consent configuration • Corporate Roles – Configurable roles defined by the
• Customer consent preferences management corporation to represent the corporate organizational
• Central consent repository structure for DPM users.

• Third-parties integration, and more


Consent topic is fully configurable by means of user-
friendly web-based user interface.
Customers can review, accept, or withdraw consents
using a self-service web-based application. All changes
are recorded for monitoring and auditing purposes on
a per-customer basis – within a micro-DB, with a full
historical view of customer consent actions.
The data can be exposed to authorized users
or applications via APIs, files, publish/subscribe
Rich UI AND CONTROLS
technologies such as Kafka and more, or stored for K2View DPM provides different views and dashboards to
regulation compliancy and evidence management. display the relevant information and statistics to the role
assigned to the user.
DPM ENTITIES
K2View DPM require users to define the following objects
ADMINISTRATION
OVERVIEW K2VIEW FABRIC STUDIO
Configuring, monitoring, and administrating K2View Data
Fabric is performed through two set of tools:
• Digital entity type definition
• Digital entity schema configuration
• Using K2View Admin Manager, K2View Studio, K2View • Sync Policy configuration
Web Admin.
• Data Enrichment/ETL rules
• Using the distributed database native administration
• Data masking rules
capabilities.
This section will present the list of functionalities provided • Index configuration
by K2View Fabric administration and configuration tools. • Deployment to K2View Data Fabric environment
• Commit/update to and from Version Control repository
K2VIEW ADMIN MANAGER
K2VIEW WEB ADMIN
• Version control administration: define repositories
and developer access to K2View Fabric configuration. • Query executions
Developer access is restricted via password, and can • Index definition
be granular to any level of the configuration. • User management
• K2View Data Fabric services control: start/stop of • Role and permission management
K2View Data Fabric’s services. • Nodes administration

TOTAL COST OF OWNERSHIP


OVERVIEW The minimum requirements for the Linux cluster are:

K2View Data Fabric does not require storage of all • CentOS 6.5 OS or Redhat 6.5 with latest patches.
data in memory or expensive hardware for scaling up • Modern Xeon Processor.
performance. K2View Data Fabric’s low total cost of • 4 nodes x 4 cores.
ownership (TCO) relies on three simple cornerstones:
• 32GB RAM.
• In-Memory performance on commodity hardware • HDD, select one of the following two options:
• Complete linear scalability
• 2 Physical HDD, 500GB each, RAID0 configuration.
• Risk-free integration
• 2 500GB SSD.
This section will reviews this hardware configuration; it
is intended to provide a benchmark of performance per It is very important to note that K2View Data Fabric
hardware type. stores data on disk and not in memory (only current
operations are done in memory). Using regular disk
MINIMUM REQUIREMENTS
storage is a contributing factor to K2View Data Fabric’s
As mentioned, K2View Data Fabric can run on commodity low TCO, as opposed to other distributed high-end
hardware; therefore, the minimum requirements to install performances databases that store all data in memory,
K2View Data Fabric are easily accessible. and thus requiring massive amounts of RAM.
K2View Data Fabric is installed on two servers: The minimum requirements for the Windows server are:
• One Linux Server to manage the server node.
• Windows Version – Any one of the following:
• One Windows Server to run the administration and
• Windows Server 2008 r2 64bit Machine
configuration tools described in the previous section.
• Windows 7 64Bit or Windows 8 64Bit1 CPU
• 4GB RAM

• 100GB Available Disk Space


LINEAR SCALABILTY K2View Fabric inherently drives integration costs to a
minimum:
K2View Data Fabric’s linear scalability is ensured by
the proven linear scalability of its underlying distributed • Data migration is fully automated by the embedded ETL
database (Cassandra). layer without impact on source systems

RISK-FREE INTEGRATION
• Flexible synchronization allows progressive legacy
systems retirement
As explained at the beginning of this paper, and on top of
the actual costs associated with a new system purchase
• SQL support does not require any learning curve for
database user
(hardware, licenses, etc.), one major cost component of
any data management system is its integration into an
• Embedded Web-Services allows integration without
changes on database applications
existing IT eco-system. Indeed, integrating a new system These features make integrating K2View Fabric into any IT
is not only very costly, it can also present an elevated risk environment a risk free operation.
for organizations provisioning applications to millions of
its customers.

CONCLUSION
This paper has detailed the key components of the ABOUT K2VIEW
K2View Data Fabric, and how they contribute to making it
K2View provides an operational data fabric dedicated
a next generation distributed data management system –
to making every customer experience personalized and
an operational data fabric.
profitable.
Indeed, while K2View Data Fabric solves the big data
The K2View platform continually ingests all customer
problem established in the introduction, it also delivers
data from all systems, enriches it with real-time insights,
risk-free integration within an existing IT eco-system and
and transforms it into a patented Micro-DB™ - one for
uses a revolutionary business-oriented way to present
every customer. To maximize performance, scale, and
data: the digital entity. K2View’s patented digital entity
security, every micro-DB is compressed and individually
and micro-DB technologies lend to the data fabric’s high-
encrypted. It is then delivered in milliseconds to fuel quick,
scale, high-performance, high-availability, and fully secure
effective, and pleasing customer interactions.
architecture.
Global 2000 companies – including AT&T, Vodafone,
Moreover, K2View Data Fabric offers unprecedented
Sky, and Hertz – deploy K2View in weeks to deliver
features: ease of configuration, full SQL support,
outstanding multi-channel customer service, minimize
embedded data services, flexible synchronization,
churn, achieve hyper-segmentation, and assure data
performance-oriented processing engine and complete
compliance.
security granularity.
To learn more about K2View Data Fabric, refer to
K2View’s website: www.k2view.com
CONTACT INFORMATION
• www.k2view.com
[email protected]
• +1-844-438-2443
CONFIDENTIALITY
This document contains copyrighted work and proprietary
information belonging to K2View.
This document and information contained herein are
delivered to you as is, and K2View makes no warranty
whatsoever as to its accuracy, completeness, fitness for a
particular purpose, or use. Any use of the documentation
and/or the information contained herein, is at the user’s
risk, and K2View is not responsible for any direct, indirect,
special, incidental, or consequential damages arising
out of such use of the documentation. Technical or other
CONTACT INFORMATION inaccuracies, as well as typographical errors, may occur in
this Guide.
• www.k2view.com
[email protected] This document and the information contained herein and
any part thereof are confidential and proprietary to K2View.
• +1-844-438-2443
All intellectual property rights (including, without limitation,
copyrights, trade secrets, trademarks, etc.) evidenced by
or embodied in and/or attached, connected, or related to
this Guide, as well as any information contained herein, are
and shall be owned solely by K2View. K2View does not
convey to you an interest in or to this Guide, to information
contained herein, or to its intellectual property rights, but
only a personal, limited, fully revocable right to use the
Guide solely for reviewing purposes. Unless explicitly set
forth otherwise, you may not reproduce by any means any
document and/or copyright contained herein.
Information in this Guide is subject to change without
notice. Corporate and individual names and data used in
examples herein are fictitious unless otherwise noted.
Copyright © 2015 K2View Ltd./K2VIEW LLC. All rights
reserved. The following are trademark of K2View:
K2View logo, K2View’s platform.
K2View reserves the right to update this list from time to
time.
Other company and brand products and service names in
this document are trademarks or registered trademarks of
their respective holders.

You might also like