Adtech x Scala x Performance tuning

×
∼ Best Practice for Better Performance ∼
Scala Days 2015 San Francisco Un-conference
2015-03-19 @mogproject
Ad Tech Performance
Tuning
Scala
×

Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development
Japanese language version here:
http://www.slideshare.net/mogproject/scala-41799241

Yosuke Mizutani (@mogproject) 
Joined Demand Side Science in April 2013 
 
(thanks to Scala Conference in Japan 2013)
Full-stack engineer (want to be…)
Background: 9-year infrastructure engineer
About Me

http://about.me/mogproject

http://mogproject.blogspot.jp
in Japanese Language

http://demand-side-science.jp/blog

http://functional-news.com

Nov 2012 
Established Demand Side Science Inc.
Brief History of DSS
Demand
×
Side
Science×

2013 
Developed private DSP package fractale
Demand
×
Side
Platform×

Advertiser’s side of realtime ads bidding (RTB)
What is DSP
Supply Side Platform

Dec 2013 
Moved into the group of Opt, the e-marketing agency
Oct 2014 
Released dynamic creative tool unis
× ×

unis is a third-party ad server which creates
dynamic and/or personalized ads under the rules.
 
http://www.opt.ne.jp/news/pr/detail/id=2492
unis
items on sale most popular
items
ﬁxed items
re-targeting

With venture mind + advantage of Opt group …
Future of DSS
Demand
×
Side
Science×

We will create various products based on Science!
Future of DSS
???
×
???
Science×

For everyone’s …
Future of DSS
Marketer × Publisher Consumer×

happiness!
Future of DSS
Win × Win Win×

We
DSS and Scala
Adopt Scala ×
for all products
from the day of establishment
×

System Architecture Example
RDBMS
NOSQL
log storage
cache
Log Aggregation
Machine Learning
Cache Making
etc.

Today, I will not talk about JavaScript tuning.
System Architecture Example

Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development

Resolve an issue
Reduce infrastructure cost 
(e.g. Amazon Web Services)
Motivations

Application goes wrong with high load 
Bad latency under the speciﬁc condition
Slow batch execution than expectations
Slow development tools
Resolve an Issue

Very important especially in ad tech industry
Cost tends to go bigger and bigger
High trafﬁc
Need to response in few milli seconds
Big database, big log data
Business requires 
Beneﬁt from mass delivery > Infra Investment 
Reduce Infrastructure Cost

You need to care about
cost (≒ engineer’s time) and
risk (possibility to cause new trouble)
for performance tuning itself.
Don’t lose you goal
Scaling up/out of Infra can be the best
solution, naively
Don’t want to be perfect

We iterate
Basic of Performance Tuning
Measure metrics
× Find bottleneck
Try with hypothesis×
Don't take erratic steps.

http://en.wikipedia.org/wiki/Pareto_principle
“80% of a program’s processing time
come from 20% of the code”
— Pareto Principle

※CAUTION: This is my own impression
Bottle Neck in My Experience
others
1%
Network
4%
JVM parameter
5%
Library
5%
OS
10%
Scala
10%
Async・Thread
15%
Database
(RDBMS/NOSQL)
50%

What is I/O
Memory × Disk
Network×

Approximate timing for various operations
http://norvig.com/21-days.html#answers
execute typical instruction 1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory 0.5 nanosec
branch misprediction 5 nanosec
fetch from L2 cache memory 7 nanosec
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
send 2K bytes over 1Gbps network 20,000 nanosec
read 1MB sequentially from memory 250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk 20,000,000 nanosec
send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec

If Typical Instruction Takes 1 second…
https://www.coursera.org/course/reactive week3-2
execute typical instruction 1 second
fetch from L1 cache memory 0.5 seconds
branch misprediction 5 seconds
fetch from L2 cache memory 7 seconds
Mutex lock/unlock ½ minute
fetch from main memory 1½ minute
send 2K bytes over 1Gbps network 5½ hours
read 1MB sequentially from memory 3 days
fetch from new disk location (seek) 13 weeks
read 1MB sequentially from disk 6½ months
send packet US to Europe and back 5 years

A batch
reads 1,000,000 ﬁles of 10KB
from disk
for each time.
Data size:
10KB × 1,000,000 ≒ 10GB
Horrible and True Story

Assuming 1,000,000 seeks are needed, 
Estimated time: 
8ms × 106
+ 20ms × 10,000 ≒ 8,200 sec ≒ 2.5 h
If there is one ﬁle of 10GB and only one seek is
needed,
Estimated time:
8ms × 1 + 20ms × 10,000 ≒ 200 sec ≒ 3.5 min
Horrible and True Story

Have Respect for the Disk Head
http://en.wikipedia.org/wiki/Hard_disk_drive

JVM Trade-offs
JVM Performance Triangle
Memory Footprint ↓
Throughput ↑ Latency ↓
longest pause
time for Full GC

In the other words…
Compactness
Throughput Responsiveness

C × T × R = a
Tuning: vary C, T, R for ﬁxed a
Optimization: increase a
Reference:
Everything I ever learned about JVM performance tuning
@twitter by Attila Szegedi
http://www.beyondlinux.com/ﬁles/pub/qconhangzhou2011/Everything%20I%20ever%20learned
%20about%20JVM%20performance%20tuning%20@twitter%28Attila%20Szegedi%29.pdf

1. Requirement Deﬁnition / Feasibility
2. Basic Design
3. Detailed Design
4. Building Infrastructure / Coding
5. System Testing
6. System Operation / Maintenance
Development Process
Only topics related to performance will be covered.

Make the agreement with stakeholders
about performance requirement
Requirement Deﬁnition / Feasibility
How many user IDs
internet users in Japan: 100 million
unique browsers: 200 ~ x00 million
will increase?
data expiration cycle?
type of devices / browsers?
opt-out rate?

Number of deliver requests for ads
Number of impressions per month
In case 1 billion / month 
=> mean: 400 QPS (Query Per Second) 
=> if peak rate = 250%, then 1,000 QPS
For RTB, bid rate? win rate?
Goal response time? Content size?
Plans for increasing?
How about Cookie Sync?

Number of receiving trackers
Timing of firing tracker
Click rate?
Conversion(*) rate? 
 
 
 
* A conversion occurs when the user performs the
specific action that the advertiser has defined as the
campaign goal. 
e.g. buying a product in an online store

Requirement for aggregation
Indicates to be aggregated
Is unique counting needed?
Any exception rules?
Who and when
secondary processing by ad agency?
Update interval
Storage period

Hard limit by business side
Sales plan
Christmas selling?
Annual sales target?
Total budget

The most important thing is to provide numbers,
although it is extremely difﬁcult to approximate
precisely in the turbulent world of ad tech.
Architecture design needs assumed value
Performance testing needs numeric goal

Architecture design
Choose framework
Web framework
Choose database
RDBMS
NOSQL
Basic Design

Threading model design
Reduce blocking
Future based
Callback & function composition
Actor based
Message passing
Thread pool design
We can’t know the appropriate thread pool
size unless we complete performance
testing in production.
Basic Design

Database design
Access pattern / Number of lookup
Data size per one record
Create model of distribution when the size
is not constant
Number of records
Rate of growth / retention period
Memory usage
At ﬁrst, measure the performance of the
database itself
Detailed Design

Log design
Consider compression ratio for disk usage
Cache design
Some software needs the double of capacity
for processing backup (e.g. Redis)
Detailed Design

Simplicity and clarity come ﬁrst
“It is far, far easier to make a correct
program fast than it is to make a fast
program correct” 
 
— C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C
++ in-depth series)
Building Infrastructure / Coding

— Donald Knuth
“Premature optimization
is the root of all evil.”

— Jon Bentley
“On the other hand,
we cannot ignore efﬁciency”

Avoid the algorithm which is worse than linear
as possible
Measure, don’t guess 
 
http://en.wikipedia.org/wiki/Unix_philosophy
Building Infrastructure / Coding

SBT Plugin for running OpenJDK JMH  
(Java Microbenchmark Harness: Benchmark tool for Java) 
 
https://github.com/ktoso/sbt-jmh
Micro Benchmark: sbt-jmh

addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.1.6")
plugins.sbt
jmhSettings
build.sbt
import org.openjdk.jmh.annotations.Benchmark
class YourBench {
@Benchmark
def yourFunc(): Unit = ??? // write code to measure
}
YourBench.scala
Just put an annotation

> run -i 3 -wi 3 -f 1 -t 1
Run benchmark in the sbt console
Number of
measurement
iterations to do
Number of warmup
iterations to do
How many times to forks
a single benchmark
Number of worker
threads to run with

[info] Benchmark Mode Samples Score Score error Units
[info] c.g.m.u.ContainsBench.listContains thrpt 3 41.033 25.573 ops/s
[info] c.g.m.u.ContainsBench.setContains thrpt 3 6.810 1.569 ops/s
Result (excerpted)
By default, throughput score
will be displayed.
(larger is better)
http://mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html

Scala Optimization Example
Use Scala collection correctly
Prefer recursion to function call 
by Prof. Martin Odersky in Scala Matsuri 2014
Try optimization libraries

def f(xs: List[Int], acc: List[Int] = Nil): List[Int] = {
if (xs.length < 4) {
(xs.sum :: acc).reverse
} else {
val (y, ys) = xs.splitAt(4)
f(ys, y.sum :: acc)
}
}
Horrible and True Story pt.2
Group by 4 elements of List[Int], then 
calculate each sum respectively
scala> f((1 to 10).toList)
res1: List[Int] = List(10, 26, 19)
Example

List#length takes time proportional to the
length of the sequence
When the length of the parameter xs is n, 
time complexity of List#length is O(n)
Implemented in LinearSeqOptimized#length 
https://github.com/scala/scala/blob/v2.11.4/src/library/scala/collection/
LinearSeqOptimized.scala#L35-43

In function f, 
xs.length will be evaluated n / 4 + 1 times, 
so number of execution of f is also
proportional to n
Therefore, 
time complexity of function f is O(n2)
It becomes too slow with big n

For your information, the following one-liner does
same work using built-in method
scala> (1 to 10).grouped(4).map(_.sum).toList
res2: List[Int] = List(10, 26, 19)

Library for optimising Scala collection 
(by using macro)
http://scala-blitz.github.io/
Presentation in Scala Days 2014 
https://parleys.com/play/
53a7d2c6e4b0543940d9e549/chapter0/
about
ScalaBlitz

System feature testing
Interface testing
Performance testing
Reliability testing
Security testing
Operation testing
System Testing

Simple load testing
Scenario load testing
mixed load with typical user operations
Aging test (continuously running test)
Performance Testing

Apache attached
Simple benchmark tool 
http://httpd.apache.org/docs/2.2/programs/ab.html 
Adequate for naive requirements
Latest version recommended 
(Amazon Linux pre-installed version’s bug made me sick)
Example
ab - Apache Bench
ab -C <CookieName=Value> -n <NumberOfRequests> -c <Concurrency> “<URL>“

Result example (excerpted)
ab - Apache Bench
Benchmarking example.com (be patient)
Completed 1200 requests
(略)
Finished 12000 requests
(略)
Concurrency Level: 200
Time taken for tests: 7.365 seconds
Complete requests: 12000
Failed requests: 0
Write errors: 0
Total transferred: 166583579 bytes
HTML transferred: 160331058 bytes
Requests per second: 1629.31 [#/sec] (mean)
Time per request: 122.751 [ms] (mean)
Time per request: 0.614 [ms] (mean, across all concurrent requests)
Transfer rate: 22087.90 [Kbytes/sec] received
(略)
Percentage of the requests served within a certain time (ms)
50% 116
66% 138
75% 146
80% 150
90% 161
95% 170
98% 185
99% 208
100% 308 (longest request)
Requests per second
= QPS

Load testing tool written in Scala
http://gatling.io
Gatling

An era of Apache JMeter has ﬁnished
Say good bye to scenario making with GUI
With Gatling,
You load write scenario with Scala DSL
Gatling

Care for the resource of stressor side
Resource of server (or PC)
Network router (CPU) can be bottleneck
Don’t tune two or more parameters at one
time
Leave change log and log ﬁles
Days for Testing and Tuning

System Operation / Maintenance
Logging ×
Anomaly Detection
Trends Visualization×

Day-to-day logging and monitoring
Application log
GC log
Proﬁler
Anomaly detection from several metrics
Server resource (CPU, memory, disk, etc.)
abnormal response code
Latency
Trends visualization from several metrics
System Operation / Maintenance

GC log 
Add JVM options as follows
JVM Settings
-verbose:gc
-Xloggc:<PathToTheLog>
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M

— Real customer
“If someone doesn’t enable
GC logging in production,
I shoot them!
http://www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf p55

JMX (Java Management eXtensions) 
Add JVM options as follows
JVM Settings
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=<PORT NUMBER>
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

stdout / stderr
Should redirect to ﬁle
Should NOT throw away to /dev/null
Result of thread dump 
(kill - 3 <PROCESS_ID>) will be written
here
JVM Settings

SLF4J + Proﬁler
http://www.slf4j.org/extensions.html
Coding example
Proﬁler
import org.slf4j.profiler.Profiler
val profiler: Profiler = new Profiler(this.getClass.getSimpleName)
profiler.start(“A”)
doA()
profiler.start(“B”)
doB()
profiler.stop()
logger.warn(profiler.toString)

SLF4J + Profiler
Output example
Example: 
Log the result of the profiler when
timeout occurs
Profiler
+ Profiler [BASIC]
|-- elapsed time [A] 220.487 milliseconds.
|-- elapsed time [B] 2499.866 milliseconds.
|-- elapsed time [OTHER] 3300.745 milliseconds.
|-- Total [BASIC] 6022.568 milliseconds.

For catching trends, not for anomaly detection
Operation is also necessary not to look over
the sign of change
Not only for infrastructure /application, but
business indicates
Who uses the console?
System user
System administrator
Application developer
Business manager
Trends Visualization

Graphite - http://graphite.readthedocs.org
Manage and visualize numeric time-series
data
Grafana - http://grafana.org/
Visualize Graphite data more stylish 
(or Kibana-like)
Grafana (+Graphite)

×
∼ Best Practice for Better Performance ∼
Scala Days 2015 San Francisco Un-conference
2015-03-19 @mogproject
Thank very
much!
you
×

"Yosuke Mizutani - Kanagawa, Japan | about.me" - http://about.me/mogproject
"mog project" - http://mogproject.blogspot.jp/
"DSS Tech Blog - Demand Side Science ㈱の技術ブログ" - http://demand-side-
science.jp/blog/
"FunctionalNews - 関数型言語ニュースサイト" - http://functional-news.com/
"『ザ・アドテクノロジー』∼データマーケティングの基礎からアトリビューション
の概念まで∼ / 翔泳社新刊のご紹介" - http://markezine.jp/book/adtechnology/
"オプト、ダイナミック・クリエイティブツール「unis」の提供開始 ∼ パーソナラ
イズ化された広告を自動生成し、広告効果の最大化を目指す ∼ | インターネット広
告代理店オプト" - http://www.opt.ne.jp/news/pr/detail/id=2492
"The Scala Programming Language" - http://www.scala-lang.org/
"Finagle" - https://twitter.github.io/finagle/
"Play Framework - Build Modern & Scalable Web Apps with Java and Scala" -
https://www.playframework.com/
"nginx" - http://nginx.org/ja/
"Fluentd | Open Source Data Collector" - http://www.fluentd.org/
"Javaパフォーマンスチューニング（1）：Javaパフォーマンスチューニングのルー
ル (1/2) - ＠IT" - http://www.atmarkit.co.jp/ait/articles/0501/29/news011.html
"パレートの法則 - Wikipedia" - http://ja.wikipedia.org/wiki/パレートの法則
"Teach Yourself Programming in Ten Years" - http://norvig.com/21-
days.html#answers
"企業が作る国際ネットワーク最前線 - ［4］いまさら聞けない国際ネットワークの
基礎知識：ITpro" - http://itpro.nikkeibp.co.jp/article/COLUMN/20100119/
343461/
"Coursera" - https://www.coursera.org/course/reactive
"アースマラソン - Wikipedia" - http://ja.wikipedia.org/wiki/アースマラソン
"Hard disk drive - Wikipedia, the free encyclopedia" - http://en.wikipedia.org/
wiki/Hard_disk_drive
"Everything I ever learned about JVM performance tuning @twitter(Attila
Szegedi).pdf" - http://www.beyondlinux.com/files/pub/qconhangzhou2011/
Everything%20I%20ever%20learned%20about%20JVM%20performance
%20tuning%20@twitter%28Attila%20Szegedi%29.pdf
"Amazon.co.jp： C++ Coding Standards―101のルール、ガイドライン、ベストプ
ラクティス (C++ in-depth series): ハーブサッター, アンドレイアレキサンドレス
ク, 浜田光之, Herb Sutter, Andrei Alexandrescu, 浜田真理: 本" - http://
www.amazon.co.jp/gp/product/4894716860
"UNIX哲学 - Wikipedia" - http://ja.wikipedia.org/wiki/UNIX哲学
"ktoso/sbt-jmh" - https://github.com/ktoso/sbt-jmh
"ScalaBlitz | ScalaBlitz" - http://scala-blitz.github.io/
"Parleys.com - Lightning-Fast Standard Collections With ScalaBlitz by Dmitry
Petrashko" - https://parleys.com/play/53a7d2c6e4b0543940d9e549/chapter0/
about
"mog project: Micro Benchmark in Scala - Using sbt-jmh" - http://
mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html
"Gatling Project, Stress Tool" - http://gatling.io/
"WEB+DB PRESS Vol.83｜技術評論社" - http://gihyo.jp/magazine/wdpress/
archive/2014/vol83
"「Javaの鉱脈」でGatlingの記事を書きました — さにあらず" - http://
blog.satotaichi.info/gatling-is-awesome-loadtester
"Garbage Collection Tuning in the Java HotSpot™ Virtual Machine" - http://
www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf
"SLF4J extensions" - http://www.slf4j.org/extensions.html
"Graphite Documentation — Graphite 0.10.0 documentation" - http://
graphite.readthedocs.org/en/latest/
"Grafana - Graphite and InfluxDB Dashboard and graph composer" - http://
grafana.org/
"Grafana - Grafana Play Home" - http://play.grafana.org/#/dashboard/db/
grafana-play-home
"不動産関係に使える無料画像一覧" - http://free-realestate.org/information/
list.html
"AI・EPSの無料イラストレーター素材なら無料イラスト素材.com" - http://www.無
料イラスト素材.com/
"大体いい感じになるKeynoteテンプレート「Azusa」作った - MEMOGRAPHIX" -
http://memo.sanographix.net/post/82160791768
References

Adtech x Scala x Performance tuning

Recommended

More Related Content

What's hot (20)

Viewers also liked (8)

Similar to Adtech x Scala x Performance tuning (20)

Recently uploaded (20)

Adtech x Scala x Performance tuning