Session 2 - Data Ingestion
Session 2 - Data Ingestion
Academy
Get Certified
Session 2 of 8
Data Ingestion
Poll Time
1. Did you attend the last session (S01 - Data Cloud Overview)?
First, some logistics
Questions, answers and videos
sfdc.co/DCAcademyGuide
Forward Looking Statements
This presentation contains forward-looking statements about, among other things, trend analyses and statements regarding future events, future financial performance, anticipated growth, industry prospects,
environmental, social and governance goals, our strategies, expectation or plans regarding our investments, including strategic investments or acquisitions, our beliefs or expectations regarding our competition, our
intentions regarding use of future earnings or dividends, and the expected timing of product releases and enhancements. The achievement or success of the matters covered by such forward-looking statements
involves risks, uncertainties and assumptions. If any such risks or uncertainties materialize or if any of the assumptions prove incorrect, Salesforce’s results could differ materially from the results expressed or implied by
these forward-looking statements. The risks and uncertainties referred to above include those factors discussed in Salesforce’s reports filed from time to time with the Securities and Exchange Commission, including,
but not limited to: our ability to maintain security levels and service performance that meet the expectations of our customers, and the resources and costs required to avoid unanticipated downtime and prevent,
detect and remediate performance degradation and security breaches; the expenses associated with our data centers and third-party infrastructure providers; our ability to secure additional data center capacity; our
reliance on third-party hardware, software and platform providers; uncertainties regarding AI technologies and its integration into our product offerings; the effect of evolving domestic and foreign government
regulations, including those related to the provision of services on the Internet, those related to accessing the Internet, and those addressing data privacy, cross-border data transfers and import and export controls;
current and potential litigation involving us or our industry, including litigation involving acquired entities, and the resolution or settlement thereof; regulatory developments and regulatory investigations involving us or
affecting our industry; our ability to successfully introduce new services and product features, including any efforts to expand our services; the success of our strategy of acquiring or making investments in
complementary businesses, joint ventures, services, technologies and intellectual property rights; our ability to complete, on a timely basis or at all, announced transactions; our ability to realize the benefits from
acquisitions, strategic partnerships, joint ventures and investments, and successfully integrate acquired businesses and technologies; our ability to compete in the markets in which we participate; the success of our
business strategy and our plan to build our business, including our strategy to be a leading provider of enterprise cloud computing applications and platforms; our ability to execute our business plans; our ability to
continue to grow unearned revenue and remaining performance obligation; the pace of change and innovation in enterprise cloud computing services; the seasonal nature of our sales cycles; our ability to limit
customer attrition and costs related to those efforts; the success of our international expansion strategy; the demands on our personnel and infrastructure resulting from significant growth in our customer base and
operations, including as a result of acquisitions; our ability to preserve our workplace culture, including as a result of our decisions regarding our current and future office environments or remote work policies; our
dependency on the development and maintenance of the infrastructure of the Internet; our real estate and office facilities strategy and related costs and uncertainties; fluctuations in, and our ability to predict, our
operating results and cash flows; the variability in our results arising from the accounting for term license revenue products; the performance and fair value of our investments in complementary businesses through our
strategic investment portfolio; the impact of future gains or losses from our strategic investment portfolio, including gains or losses from overall market conditions that may affect the publicly traded companies within
our strategic investment portfolio; our ability to protect our intellectual property rights; our ability to maintain and enhance our brands; the impact of foreign currency exchange rate and interest rate fluctuations on our
results; the valuation of our deferred tax assets and the release of related valuation allowances; the potential availability of additional tax assets in the future; the impact of new accounting pronouncements and tax
laws; uncertainties affecting our ability to estimate our tax rate; uncertainties regarding our tax obligations in connection with potential jurisdictional transfers of intellectual property, including the tax rate, the timing of
transfers and the value of such transferred intellectual property; uncertainties regarding the effect of general economic, business and market conditions, including inflationary pressures, general economic downturn or
recession, market volatility, increasing interest rates, changes in monetary policy and the prospect of a shutdown of the U.S. federal government; the potential impact of financial institution instability; the impact of
geopolitical events, including the ongoing armed conflict in Europe; uncertainties regarding the impact of expensing stock options and other equity awards; the sufficiency of our capital resources; our ability to execute
our share repurchase program; our ability to comply with our debt covenants and lease obligations; the impact of climate change, natural disasters and actual or threatened public health emergencies; expected
benefits of and timing of completion of the restructuring plan and the expected costs and charges of the restructuring plan, including, among other things, the risk that the restructuring costs and charges may be
greater than we anticipate, our restructuring efforts may adversely affect our internal programs and ability to recruit and retain skilled and motivated personnel, our restructuring efforts may be distracting to employees
and management, our restructuring efforts may negatively impact our business operations and reputation with or ability to serve customers, and our restructuring efforts may not generate their intended benefits to the
extent or as quickly as anticipated; and our ability to achieve our aspirations, goals and projections related to our environmental, social and governance initiatives, including our ability to comply with emerging
corporate responsibility regulations.
September 8, 2023
Today’s Agenda
Ingestion Methods
Next Steps
Q&A
Your Salesforce Team
Partner Opportunities:
Use Case driven approach to understand customer challenges
Solution overview:
Trust and Data Ethics, Right to be Forgotten through consent API,
What Data cloud does and does not, examples of Industry use cases
Recap - Setup and Administration
Data Cloud Provisioning
● Home Org and Existing org
● Data Cloud Permission Sets (Data Cloud Admin and Data Cloud Marketing
Admin)
● Object Permissions
● Creating Profiles & adding users
✅
Verify access to below links
1
● Data Cloud Consultant Certification: http://sfdc.co/DCCert
● Data Cloud Consultant Exam Guide: http://sfdc.co/DCCertGuide
Start with Prepare for your Salesforce Data Cloud Consultant Credential
✅ 2
● Salesforce Data Cloud Consultant Credential Trailmix
Complete
✅ 3 ● Data Cloud: Solution Overview
● Data Cloud: Setup and Administration Bookmark -> Program Guide
sfdc.co/DCAcademyGuide
Recap of Homework - Partner Learning Camp
➔ Verify access to Bookmark -> Program Guide
✅ ●
●
Partner Community
PLC (Partner Learning Camp)
✅ sfdc.co/DCAcademyGuide
➔ Enroll for
✅ ● Data Cloud: Practical Experience Course
http://sfdc.co/DCPractical
➔ Slack Channel
✅➔ Register for Free S3 Trial Account ✅ Join #help-dc-academy-april2024 Slack channel
- Trial Account
✅➔ ●
Complete
Activity: Partner Pocket Guide Data Cloud
➔ Request a Data Cloud Trial org*
✅
● Activity: Join Collaboration Channels
● Activity: Request a Data Cloud Trial Org
● Activity: Review the Data Cloud Practical Experience
FAQ
sfdc.co/dctrialorgroe
● Activity: Set Up Your Instance *Trial orgs only available for 30 days
NOTE: If you are Not a Salesforce Partner you can sign up for a free, 5-day Developer Edition org with Data Cloud
Frequently Asked Questions
I can’t access the PLC! What do I do?
● Review & follow troubleshooting post in Slack
● Contact [email protected] if still
blocked
How do I request a Trial org?
● Login to Partner Learning camp, browse to demo org
section and request for DCO org
● Wait ~24 Hours for the org to be provisioned
I have a specific technical question!
● Ask in the Q&A quip or #help-dc-academy-april2024
slack channel
Can I review the recordings or deck?
● Recordings will be posted in the quip linked below
sfdc.co/DCAcademyGuide
Data Ingestion
The Big Picture: Implementation Themes
Related to the components of Data Cloud
Data Consumption
integrations to source/target systems, etc. business intelligence tools.
Data Preparation
Data Ingestion Segmentation
Set up data streams bringing data into Data Turn mapped data into useful audiences or
Cloud from various supported sources and segments, to understand, target or analyze
applying necessary transformations customers at the unified level.
Analytics
Legacy Systems
Third Party
Let’s walk through how this works
A “day in the life” of customer data
Data Sources
Customer 360
Cloud Storage
Amazon S3
Google Cloud
● What Data would you load?
Microsoft Azure
● What connectors do you need?
Zero-Copy Federation
Snowflake
● How much data would you load?
Google BigQuery
Legacy Systems
Let’s walk through how this works
A “day in the life” of customer data
Analytics
Legacy Systems
Third Party
Hyperscale Data Store
Data Lakehouse combines best of Data Lake and Data Warehouse worlds
Purpose Applicable for machine Best for data analytics and BI, Flexible storage, can be used for
learning and artificial but limited to particular research, data analytics and ML
intelligence tasks
+ problem-solving
=
ACID Non-ACID compliant: data ACID-compliant: ensures the ACID-compliant: ensures
integrity issues integrity of data consistency of data read and
compliance
written by multiple sources
Cost of storage Cost-effective, fast, flexible Expensive, time-consuming Cost-effective, easy, allows for a
lot of flexibility, reduced data
duplication
Data Ingestion Flow Overview
● Multi Format (Json, csv, ● Schema enforced ● Semantic Mapping establishes Data Spaces - Once your data has been
parquet, orc) ● Parquet formatted Iceberg Tables DLO to DMO ingested, it is assigned to a Data Space that
● ●
Multi Sourced - Cloud Storage, Hydrated by transformations ● Can be optionally materialized acts as a partition, allowing you greater
Mulesoft, Kafka ● Typed (Profile Vs Engagement) ● Insights, Unified Profiles are control over how your data is organized
● Schema Preserving ● Materialized Tables DMOs
● Salesforce Data come direct into ● Simplified Curated Data to Data Model Objects - These are either
● Virtual BYOL Tables
Lake Objects Powers Business Applications materialized or views on top of the Data
Lake Objects. These can be Customer 360
DMO or materialized ones such as Unified
Individual, Computed Insights,
transformations etc.
Data Streams & Sources
Ingestion - Marketing Cloud
What is it?
• Native Integration with Marketing Cloud to bring in any MC data into
Data Cloud
• Ingest Data from any data extension in MC in a few clicks and any
channel related data like Opens, Clicks, Bounce etc.
Lookback
Connectors Data Delivery Latency Refresh Mode
Window
Marketing Cloud 90 days Batch Hourly - 24 Hours Upsert or Full Refresh
Hourly Upsert
Salesforce CRM No limit Batch
Bi-weekly Full Refresh
Cloud File Storage
None Batch Hourly Upsert or Full Refresh
(S3, GCS, Azure)
Sales Order and Sales Order
Sales Order - Upsert
B2C Commerce 30 days Batch Customer - Hourly
All others - Full Refresh
Others - Daily
Profile - 15 minutes Users - Upsert
Marketing Cloud Personalization 0 days Near Real Time
Events/Engagement - 2 mins All others - Insert
Ingestion API (Batch and
Near Real Time 15 minutes Upsert
Streaming)
User Profiles - Hourly
Web and Mobile SDK Near Real Time
Engagement - 15 minutes
Mulesoft (using Ingestion API) Near Real Time 15 Minutes
EXAM TIP
EXAM TIP
Data Object Type Categories
Important
You cannot change the category after saving the data stream
EXAM TIP
EXAM TIP
Data Field Types
Consider functions like CONCAT() Consider functions like IF(), AND(), or Consider functions like PROPER() or
NOT() REPLACE()
A streaming data transform reads one record in a source data lake object, reshapes the record data, and writes one or more
records to a target data lake object.
The source and target objects must be different objects.
A streaming data transform runs continuously as a streaming process, picking up new or changed data.
Use Case:
Normalize Data with UNION Use Case
Batch Data Transforms
Use a batch data transform to create a repeatable series of operations to transform your data and
load it into a target for further usage.
● Does a full refresh of data
● Can use multiple source objects
● Can be used with DLOs or DMOs
Batch Transforms Streaming Transforms
● Does a full refresh ● Acts on one row of data at a time
● Repeatable process, can be ● Transforms data as it’s ingested
scheduled or triggered manually ● Works only with DLOs
● Works with DLOs or DMOs as source ● Does not replace Calculated Insights
objects
● Does not replace Calculated Insights
Recap: Ingestion
Steps for Configuring Data Streams
6. Configure
2. Select Data 3. Define Data 4. Confirm 5. Apply
1. Select Data Updates to
Source Object Stream Data Source Transforms &
Source Data Source
(Dataset) Properties Object Schema Data Space
Object
Optionally add
Choose previously Choose starting
formula fields to
connected bundle Name source, set
Verify fields and cleanse your Configure refresh
or or label, developer
data types, set source data or mode and set the
authenticate new select object name and data
primary key derive new fields schedule
data source or category
and assign to a
(cloud storage) specify filename
data space
2. True or False, The starter data bundle can be ingested multiple times.
a. True
b. False
Knowledge Check - Answer
2. True or False, The starter data bundle can be ingested multiple times.
a. True
b. False ✅
Knowledge Check
3. Which type of data extraction setting is best for when the data changes frequently, but
the majority of the data stays the same?
3. Which type of data extraction setting is best for when the data changes frequently, but
the majority of the data stays the same?
4. Which Data Object Type Category you will use to ingest Purchase Order Details?
a. Profile
b. Sales Order
c. Engagement
d. Other
Knowledge Check - Answer
4. Which Data Object Type Category you will use to ingest Purchase Order Details?
a. Profile
b. Sales Order
c. Engagement ✅
d. Other
Your Next Activity
Tip: Do not rush through the hands on
Activity without completing the Trails
Step 1 : Trailhead Modules
Important Note:
Follow along with these
EXPLORE SALESFORCE CRM DATA INGESTION exercises but wait until the
hands-on exercise
● Starter Bundle instructions before
● Direct object Ingestion configuring your Trial Org
● Data kits for CRM Data streams account!
Step 3 : Partner Learning Camp
Requirements
The RAV Group wants to combine data from two
Scenario systems: Salesforce CRM that contains data from their Solution
vehicle rental business and eCommerce platform with
RAV Group is a company that ● Use an independent Data Cloud
transactions from their retail brand selling sports org that will ingest data from
has several lines of businesses equipment. These brands operate as independent multiple data sources, and post
operating under one main entities from the customer experience perspective. After segmentation activated
brand. They use the following the data is brought into the Data Cloud platform the RAV audiences into relevant target
products: Group wants to identify and merge records of the people systems.
that exist in both systems. ● Ingest Data from all data sources
● Salesforce Marketing Cloud
● Salesforce CRM: contains data ● Extend the standard Data Model.
from vehicle rental business The combined audience should enable users from either ● Perform harmonization
● eCommerce platform: with of the brands to create segments of customers with unification of the data.
transactions from their retail certain characteristics that are sourced from either of the ● Create required insights and
brand selling sports equipment systems. These segments then can be consumed for segments from unified profiles.
further actions, such as analytics, retargeting on the ● Activate segments to be
eCommerce site , activating segments on Marketing consumed by a target system.
cloud and creating a segment that can be used in a
campaign in Marketing Cloud. Create Other segments
and calculated insights for analysis.
Let’s walk through how this works
A “day in the life” of RAV Group
Customer 360
Salesforce Out-of-the-Box Data Spaces
CRM Connectors
contains data
from vehicle Data Models
rental business
Data Bundles Segmentation
Data Mapping
Amazon S3
Batch Data Calculated Insights
Transactions from
Ingestion Identity Resolution
their retail brand
selling sports
equipment Automations
Analytics
Third Party
● Install a data bundle to assist with ingesting
data from a Salesforce org.
Found in Setup
Go to User Interface -> User Interface
Explore Salesforce CRM Data Ingestion
Bundles: Quickly integrate common data sets from multiple sources
Bundles
Native Connectors
Pre-configured Unified Data Model
Data Sources
Data Model
Mappings
Activity: Configure Data Ingestion
CRM Starter Data Bundle
Expect a roadblock!
We have designed this exercise to show you what happens when
the object you want to ingest has insufficient permissions with
the integration user.
Activity: Prepare Your Data
Amazon S3 Data Sources
Common Mistake
Many consultants skip the guide where we
configure the Amazon S3 bucket, its
Access Policy, and the User we will use to
connect. Without this step complete, you
won’t be able to connect to your S3
Bucket!
S3 Guide here:
https://salesforce.quip.com/Ge0zAXFPc
YLE
Activity: Configure Data Ingestion
Applying Data Transformations
Common Mistakes
Assign a Data Space: with the addition of Data
Spaces, make sure to assign this new DLO to your
default data space!
Reference your Org Id: Replace the placeholder
ORGID fields with your actual Data Cloud Org Id.
Data Point 1
Data Point 2
Activity: Configure Data Ingestion
Ingest Data from S3 Bucket
Common Mistakes
Exam Outline
Test takers are strongly advised to complete the Data Cloud Partner Learning Camp Curriculum before
attempting the exam
Salesforce Certified Data Cloud Consultant
Allotted Passing
Total Question
Time Score
Data Cloud
Consultant
60 105 min 62 %
2
● Salesforce Data Cloud Consultant Credential Trailmix
Complete
3 ● Data Cloud: Ingestion
Bookmark -> Program Guide
● Data Cloud : CRM Data Ingestion
sfdc.co/DCAcademyGuide
Goal (Homework) post this call -
Partner Learning Camp
➔ Complete
● Activity: Set Up Your Instance
● Activity: Prepare Your Data
● Activity: Configure Data Ingestion
● Activity: Configure Batch Transforms
➔ Extra Credit
Watch the last Marketing Cloud Moments featuring
Data Cloud & Bundles
https://mcmoments.hubs.vidyard.com/
Data Cloud
Pocket Guide
sfdc.co/datacloudpocketguide
Vouchers
sfdc.co/DCAcademyGuide
Q&A
We will try to answer most of queries here in this sheet:
http://sfdc.co/DCAcademyQnA
sfdc.co/DCAcademyGuide
Thank you
Please provide your valuable feedback
post closing this zoom session, your
feedback will be very valuable to us