0% found this document useful (0 votes)
132 views18 pages

Intelligent Data and Analytics Fabric

The document summarizes Google Cloud's new Dataplex product, which is an intelligent data fabric that unifies distributed data to automate data management and power analytics at scale. Key capabilities include logical data organization, centralized security and governance policies, end-to-end lifecycle management tools, automatic data discovery and classification, a unified metadata store, and an integrated analytics experience bringing together Google and open-source tools. Dataplex aims to help enterprises unlock freedom of choice, consistent controls, and data intelligence across their distributed data ecosystems.

Uploaded by

cwag68
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views18 pages

Intelligent Data and Analytics Fabric

The document summarizes Google Cloud's new Dataplex product, which is an intelligent data fabric that unifies distributed data to automate data management and power analytics at scale. Key capabilities include logical data organization, centralized security and governance policies, end-to-end lifecycle management tools, automatic data discovery and classification, a unified metadata store, and an integrated analytics experience bringing together Google and open-source tools. Dataplex aims to help enterprises unlock freedom of choice, consistent controls, and data intelligence across their distributed data ecosystems.

Uploaded by

cwag68
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Intelligent Data

and Analytics
Fabric
Data Cloud Summit
Solving for the future.
05/26/21
Irina Farooq Kumar Menon
Director, Product Management SVP, Data Fabric and Decision Sciences Tech
Google Cloud Equifax
CFO

The promise of
$15.4 trillion
data analytics
of impact analytics will have on
is massive the global market.

Source: McKinsey & Company: Notes from the AI frontier: Insights from hundreds of use cases
But so are the challenges

30% 16% 66%

of enterprise report NOT of managers trust of a company’s data goes


having a well thought out their data unused for analytics
data strategy

Sources:
HBR: Why is it so hard to become data driven?
MIT: Seizing Opportunity in Data Quality
Accenture, “Closing the Data-Value Gap; How to become data-driven and pivot to the new” 2019
Distributed data creates tension in IT

Data Silos
Machine Consumer
BI ETL/ ELT Tools
Learning App

Security
Data Warehouses Data Lakes Data Marts Databases

Connecting Tools

Metadata Metadata Metadata Metadata


Security Security Security Security
Governance Governance Governance Governance

Standardization

Unstructured LoB Specific Real-time App


Transactions Logs
Files Data Data

Financial Governance
Proprietary + Confidential

Dataplex Integrated Analytics Experience


(Preview) Curate | Integrate | Analyze

Data Data Lakes Data Marts Databases


Intelligent data fabric that Warehouses

unifies your distributed data to


help automate data
management and power
analytics at scale. Unified Data Management
Metadata | Intelligence | Lifecycle | Governance | Security
NDA

What is Dataplex?
Integrated Workspaces
Integrated Analytics Experience

Experience
Analytics
AI Data
Rapidly curate, secure, integrate, and analyze any type BigQuery Spark Dataflow
Platform Studio
of data, at any scale, using the best of Google-native
and Open Source tools. Serverless BYOI

Intelligent Data Management Unified Metastore

Data Management
Organize data without data movement, automatic data
discovery, metadata harvesting, lifecycle management,

Data Intelligence
and data quality with built-in AI-driven intelligence. Fine-grained Security and Governance

Data Lifecycle Mgmt


Centralized Security & Governance (Ingest, discover, prep, monitor, serve, archive)

Central policy management, monitoring and auditing for


data authorization, retention, and classification. Logical data organization

Built for Distributed Data Storage Structured Semi-Structured Unstructured Streaming Data

No data movement or duplication


GCP Multi-Cloud On-premises
NDA

Logical data organization


Unified management for distributed data

● Organize data in Lakes and Zones based on


the business use cases and LOB needs

● Combine data from different data stores


and/or projects within the same Zone without
data movement

● Use logical constructs as foundation for:


○ Data accessibility
○ Security controls
○ Financial governance

Landing Zone Structured Zone Refined Zone

Assets Assets Assets


...
Centralized security & governance
Global control with distributed ownership

Centrally define, manage, and audit data access


● Define security policies across data silos including granular
access control
● Attribute-based policies allowing definition of data classes
& allowed user groups

Distributed data ownership with global visibility


● Local data owners can grant permissions
● Data stewards can globally monitor permissions and
access

Unified governance policies


● Automatic detection of sensitive data
● Centrally manage data retention
● Unified governance of data and related artifacts like
ML models
End-to-end lifecycle management
Task-driven single pane of glass to ingest, organize, curate, secure, archive your data

One-click templates
● Data movement, tiering, and refinement
● No infrastructure to manage
● Integrated with Dataflow and Data Fusion

Unified monitoring
● Monitor your pipelines across templates, discovery
jobs, scheduled notebooks, and other ETL jobs.

Extensible platform

● Bring your custom jobs to build and monitor custom


transformations
Data intelligence
Access to higher quality data for better insights

Automatic data discovery and classification


● Dynamic schema detection & type mapping
● Metadata harvesting for unstructured data (docs, pdfs, imgs)
● Sensitive data classification (integration with DLP)

Unified metastore and open data access


● Interoperability of open-source and GCP native tools with
consistent metadata
● Automatic metadata registration in Dataproc Metastore and
BigQuery

Built-in data quality

● Built-in metadata and data checks with user-defined rules


● Anomalies raised as actions for human
● Global data quality metrics across data estate
Integrated analytics experience
Combine the best of GCP-native and Open Source analytics

For data admins


● Configure fully managed analytics environments
● Flexibility in consumption
● Usage attribution and governance

For data scientists and data analysts


● One-click access to notebooks and SQL workbench

● Run analysis using BigQuery, Apache Spark, Apache


SparkSQL and more

● Easy collaboration with ability to save, share, search


notebooks and SQL scripts

● Run recurring analysis with ability to schedule


notebooks or SQL scripts
Dataplex helps enterprises unlock

Freedom of Consistent Data


Choice Controls Intelligence
We are building Dataplex for
planet scale analytics with the help
of industry leading enterprises.
Kumar Menon, SVP, Data Fabric & Decision Science Technology
Partnering with industry leaders
Help us shape the future of Dataplex
and sign up for the preview today!

cloud.google.com/dataplex
Thank you.

You might also like