Azure + DataStax Enterprise Powers Office 365 Per User Store

Azure + DSE Powers O365 Per-User Store

1 Introduction
2 What We Built
3 How We Built It Using Cassandra + Spark
4 Why We Built It On Azure + DSE
5 How Can You Build It
6 Wrap Up
2© 2015. All Rights Reserved.

Introduction
Sean Usher
Office 365
Email: seusher@microsoft.com
Twitter: @seanushermsft
Silvano Coriani
Azure
Email: scoriani@microsoft.com
Twitter: @scoriani

Introduction – Office 365
Email
Collaboration
Document Authoring
Social Networking
Calendaring
File Storage
Business Intelligence
Etc…

Introduction – Azure
Azure is Microsoft’s cloud computing platform, a growing collection of
integrated services—analytics, computing, database, mobile,
networking, storage, and web—for moving faster, achieving more, and
saving money.

What We Built
A way to understand our users and organizations at a deeper level!
• Are users happy with the service they are receiving?
• Are users fully utilizing the services they are paying us for?
• Are users hitting issues that we can proactively help them with?
• How has a user’s experience been over their lifetime?
• Can we discover insights that we aren’t even aware of?
This requires ingesting and storing a lot of data. We need to be able
to perform fast, scalable analytics on that data, or we will discover
issues too late!
Questions:

What We Built (contd)
• Running in the cloud
• Highly scalable
• Initial ingestion of 50k events/sec, growing rapidly
• Millisecond latency for reads/writes
• High Availability
• Tunable Consistency
• Real-time and batch analytics
• Machine learning
• Storing aggregated data for 1+ years
One month to get it built…..
Platform Requirements:

Topology:
1 Physical Datacenter (Eventually will be geo-replicated)
2 Logical Datacenters (Cassandra, Analytics[Spark])
All machines are in a virtual network (VNet) and are assigned internal static IPs.
No inbound access to Cassandra machines from outside the VNet.
How We Built It Using Cassandra + Spark (contd)
Physical Datacenter
VNet
S
Logical DC
C*
Logical DC

Configuration (Azure G4 machines):
• Ubuntu
• 16 cores
• 224 GB RAM
• 3TB local SSD
Snitch: GossipingPropertyFileSnitch
Replication: All keyspaces use NetworkTopologyStrategy with replication factor=3 in each
DC.
Node Type Node Count Heap Size GC Type
Cassandra 12 12 GB G1
Spark 12 20 GB G1

Ingestion:
RESTAPI
O365
Queue (event hub)
Ingestion
Worker
(Azure worker role
using DataStax
C# driver)
Cassandra
All data is ingested into the Cassandra DC
All read APIs read from the Cassandra DC
All ingested data is PII scrubbed
APIs are the only way to get data in or out of Cassandra.

What Data Do We Ingest?
CREATE TABLE userdatasetraw (
userid text,
timepk timestamp,
device text,
createdtime timestamp,
errorcode text,
errordetail text,
omstid text,
useragent text,
PRIMARY KEY ((userid, timepk, device), createdtime)
) WITH CLUSTERING ORDER BY (createdtime DESC) AND
COMPACTION={'base_time_seconds':'50','class':'DateTieredCompactionStrategy','max_sstable_age_days':'0.25‘….
userid timepk device createdtime errorcode errordetail omstid useragent
102033ffa4a7
079e7c
14411520000
00
8000000A 14412151330
00
Failure InvalidOperati
on
15321c64d-
0A92-4f4e7-
bcc8-
2aeb89354ff2
6
null

What Data Do We Ingest?
CREATE TABLE tenantsubscription (
omstidtext,
skuid text,
city text static,
comculture text static,
exchgtid text static,
exousercnt int static,
geoareacd text static,
isconcierge boolean static,
istrial boolean,
liccount int,
lyousercnt int static,
numtrailsubs int,
orgname text static,
sku text,
spousercnt int static,
statechangedt timestamp,
subautorenew boolean,
subcreated timestamp,
subexpiry timestamp,
substate text,
tdssynctime timestamp,
usedliccount int,
PRIMARY KEY ((omstid), skuid)
)
omstid skuid city exousrcnt liccnt subautorenew sku
129001cd1-
21dc-4706-
96cfe-
1f632522d3
65ae
6fe22a85e-
b296-42f0-
b187-
1b91e9394b
900
BANGKOK 1000 1500 false OFFICE 365
ENTERPRIS
E E3

Aggregations
Spark Batch Jobs to roll up UserDatasetRaw into 1 hour and 24 hour aggregates.
Other jobs use the 1 hour and 24 hours aggregates and join them with the tenant
subscription and dataset tables to calculate insights:
Result:
1. Great feedback from customers and support agents!
2. Saved customers money!
3. Save customers from being locked out of their service!
4. Proactively fix user experience (detected customer misconfiguration)!

What Problems Did We Run Into?
• Bad Data Modelling:
• Partitions getting too large (1-2 GB) which raised the “compacting large row”
warning and led to OOM errors (Cassandra 2.0).
• Not using DateTieredCompactionStrategy for time-series data.
• Bad Configuration:
• OS limits configured too low (ulimit, nofiles, etc…)
• Number of concurrent compactors and flush writers too low.
• Not using G1 garbage collection with large heaps.
• Not paying enough attention to blocked flush writers and dropped mutations.
• Allowing SSTable count to get too high, causing OOM errors.

Why We Built It On Azure + DSE
Why Azure?
• We didn’t want to manage bare metal and the overhead it brings.
• Easy to add capacity without ordering hardware and rack
space.
• We have used Azure for other services for ~5 years.
• We have built tools for deploying and managing services.
• Great track record with Azure support.
• We love to try out new things on the Azure platform!

Why We Built It On Azure + DSE (contd)
Why Cassandra?
The Good
• Low Latency ✓
• Linear Scale ✓
• Highly Available ✓
• Aggregations (Spark/Spark Streaming) ✓
• Machine Learning (Spark ML) ✓
• No Enforcement of Full Consistency ✓ ✓ ✓
The Not-So-Good
• No Hosted Option in Azure ✗
• Have to Install and Configure it Ourselves ✗

Why We Built It On Azure + DSE (contd)
Why DataStax Enterprise?
Training:
Cassandra can be complex and its success depends on various design decision.
Getting training from the experts is invaluable to ensuring our success.
Integration:
DataStax has built integration between Cassandra and Spark (as well as other
products) and provides a tested package that we can depend on. Ops Center UI.
Support:
Cassandra is new to us. There is nothing better than being able to send off a
message when something goes wrong, and getting experts to help solve the
problem.

How Can You Build It
19
© 2015. All Rights Reserved.
- Marketplace
- Simplified set of deployment options
- Bring Your Own License
- Production Cluster or Dev Sandbox
- 4, 12, 36 or 90 nodes
- Pick your VM type and size
- Single VNET
- OpsCenter:
http://{cluster}.{region}.cloudapp.azure.com:8888
- Define your own deployment
1. Group cluster resources based on
common lifecycle
1. E.g. separate infrastructure components from
compute nodes
2. Define compute and storage options for
nodes in the cluster
1. Pick your VM type and size
2. Ephemeral vs persistent disks
(Standard/Premium)
3. Snapshots
3. Define networking options
1. VNETs configuration
2. Cross-DC (VNET to VNET) connectivity
4. Performance considerations
1. Compute
2. Storage
3. Networking

Azure Resource Manager principles
AZURE RESOURCE MANAGER API
• RBAC-based
• Template-driven
• Declarative and imperative
• Idempotent
• Multi-service
• Multi-region
• Extensible

Resource Group
 container for multiple resources
 resources exist in one* resource group
 resource groups can span regions
 resource groups can span services
RESOURCE GROUP
Deployment
 tracks template execution
 created within a resource group
 allows nested deployments

• Template describes the topology (outside the box)
• Template extensions can initiate state configuration (inside the box)
• Multiple extensions available for Windows and Linux VMs
– DSC
– Chef
– Puppet
– Custom Scripts
– AppService + WebDeploy
– SQLDB + BACPAC
Inside the Box vs. Outside the Box

Common Use Cases for ARM Templates
• Enterprises and System Integrators
– Delivering a capability or cloud capacity (building block templates, e.g. DSE)
– Delivering an end to end application (solution templates)
• Cloud Service Vendors (CSVs)
– Support different multi-tenancy approaches
• Distinct deployments per customer
– Within the CSV’s subscription
– “Bring Your Own Subscription” model that uses customer subscriptions
• Scale units within a central multi-tenant system
• Marketplace integration
• All deploy known configurations/skus/t-shirt sizes
– Lots of variables makes free form less desirable
– T-shirt Sizes / SKUs are the common approach

Design and deploy a building block template
Go to http://github.com/azure/azure-quickstart-templates
to find 100s of quick start deployment templates for finished solutions.
DataStax is evolving ARM deployment templates in this
github repo to include DSE specific capabilities (e.g.
multi-region topology) for those who want to manage
their own deployment.
Deploying DataStax with the Azure CLI
Deploying DataStax with Azure Marketplace

Compute and storage options for nodes in the cluster
• Compute families for production clusters
– D-Series, G-Series (Xeon® E5 v3)
• Local SSD disks
– DS-Series, GS-Series
• Premium Storage optimized, host caching for reads
• Storage options for nodes
– Maintain data and logs on local ephemeral SSD disks
• ~100k IOPs and 1.5 GB/sec on G5
– Leverage Premium Storage Disks for persistent data and logs
• P10, P20, P30 (128GB to 1TB, up to 5000 IOPs and 200MB/sec)
• Striped volumes to balance storage size, throughput and costs
• Max 64TB, 80000 IOPs and 1GB/sec per node
– Use Standard Storage for backup snapshots
• Low cost, geo-replicated

Networking deployment options
• Supporting your replication topology (NetworkTopologyStrategy), including geo-
replication, for disaster recovery or workload segregation purposes
• Within a VNET, bandwidth is a function of VM type/size
– Up to 20Gbps for G5
• Cross-region VNET gateways
– Standard (100Mbps) or High Performance (200Mbps), No-Crypto option
– Latency impact proportional to distance

Contact Us
Sean Usher
Office 365
Email: seusher@microsoft.com
Twitter: @seanushermsft
Silvano Coriani
Azure
Email: scoriani@microsoft.com
Twitter: @scoriani

Azure + DataStax Enterprise Powers Office 365 Per User Store

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Azure + DataStax Enterprise Powers Office 365 Per User Store (20)

More from DataStax Academy (20)

Recently uploaded (20)

Azure + DataStax Enterprise Powers Office 365 Per User Store