Web Analytics at Scale with Elasticsearch @ naver.com - Part 2 - Lessons Learned

Web Analytics at Scale with Elasticsearch @ naver.com
Part 2 - Lessons Learned
허정수 / 네이버
jason.heo.sde@gmail.com

Agenda
• Introduction
• 콘텐츠소비통계
• Part I - Architecture
• Initial Architecture -> Problems & Solutions -> Proven Architecture
• Data Pipelines
• Part II - Lessons Learned
• 성능 개선 Tip
• 운영 Tip
2017.06.22. 밋업 발표 내용

Part 1 발표 영상 https://youtu.be/Mc9gy-5d60w?t=10m40s

콘텐츠소비통계
회사 내부 직원용이 아닌,
네이버 사용자를 위한 서비스
네이버 블로그
(2016.06. 서비스 시작)
공통통계플랫폼
(2016.01. 개발 시작)
네이버 사용자
YYY 서비스
(2017.07. 서비스 시작)
다양한 네이버의 서비스들
OOO 서비스
(2016.09. 서비스 시작)
…
…
XXX 서비스
(2017.10. 서비스 계획)
…

<블로그 프론트엔드>
<블로그 통계 메뉴>

Goal
High
Throughput
Low
Latency
Ease
of Use

Architecture
Kafka 1
(Raw Log)
Kafka 2
(Refined Log)Transform
Realtime
ESLoader
Parquet
Loader
Scoreboard
Loader
Logstash
nginx
access log
Realtime
ES Cluster
Batch
ES Cluster
Parquet
Files
nBase-ARC
(Redis Cluster)
SparkSQL
Node.jsEnd Users
SparkSQL
Impala
업무
요청
&
내부
지표Zeppelin

Versions
1. Elasticsearch 2.3 & es-hadoop 2.3
2. Logstash 2.1
3. Spark 1.6
4. JDK 1.8 for ES, 1.7 for Spark
5. CDH 5.8
6. Storm 0.10
7. CentOS 7.2
8. Kafka 0.9
9. nBase-ARC 1.3

Agenda
• Introduction
• 콘텐츠소비통계
• Part I - Architecture
• Initial Architecture -> Problems & Solutions -> Proven Architecture
• Data Pipelines
• Part II - Lessons Learned
• 성능 개선 Tip
• 운영 Tip
8월 10일 밋업 내용

Execution Hint (1)
{
"query": {
"match": {...}
},
"aggr": {
"terms": {
"field": "u",
"execution_hint": "map"
}
}

Execution Hint (2)
SELECT u, COUNT(*)
FROM tab
WHERE <조건>
GROUP BY u
SQL 실행 순서
1. "조건에 맞는 문서" 조회
2. u field로 Aggregation
예상수행 시간
- Matching Document 개수에 비례
- "조건에 맞는 문서" 개수가 0건이면 0초에 가까워야 한다
- Aggregation할 대상이 없으므로

Execution Hint (3)
Matching Document 개수
실험 결과

JVM Tuning (1)
Stop-The-World phase
Full GC 자체가 문제는 아니지만 종종 STW가 발생함
[INFO ][monitor.jvm ]
[hostname]
[gc][old][109757][7966]
duration [15.9s],
collections [2]/[16.2s], <= 16초 동안 아무 응답이 없음
total [15.9s]/[12.8m], memory [12.9gb]->[11.2gb]/[14.5gb],
all_pools {[young] [1.2gb]->[146.1mb]/[1.2gb]}{[survivor]
[394.7mb]->[0b]/[438.8mb]}{[old] [11.3gb]->[11gb]/[12.8gb]}
<ES Log에서 발췌>

JVM Tuning (3)
JVM Option – OLD Gen.으로 옮길 경향을 줄인다
-XX:MaxTenuringThreshold=15
-XX:NewRatio=7
-XX:SurvivorRatio=3
-XX:-UseAdaptiveSizePolicy
<Default GC Option> <GC Tuning>
Node별 GC 옵션을 다르게 한 뒤 입수 시, Heap 사용량 그래프

g1 gc (1)
• 100B docs are indexed
• 5 nodes in the cluster
• 3 nodes with cms gc
• 2 nodes with g1 gc
-XX:+UseG1GC
-XX:+PerfDisableSharedMem
-XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=8m
-XX:MaxGCPauseMillis=250
-XX:InitiatingHeapOccupancyPercent=75
-XX:+UseLargePages
-XX:+AggressiveOpts
<g1 gc option>
https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
<Disclaimer>
elastic.co would like to recommend G1GC someday,
but not for now

g1 gc (2)
"gc": {
"collectors": {
"young": {
"collection_count": 141144,
"collection_time": "1.7h",
"collection_time_in_millis": 6295572
},
"old": {
"collection_time": "20.6m",
}
}
}
"gc": {
"collectors": {
"young": {
"collection_time": "1.4h",
},
"old": {
"collection_time": "27s",
}
}
}
<cms gc> <g1 gc>
the output of node status (/_nodes/hostname/) API
Which one looks better?

g1 gc (3)
[INFO ][monitor.jvm ]
[hostname] [gc][old][109757][7966]
duration [15.9s], collections
[2]/[16.2s], total [15.9s]/[12.8m],
memory [12.9gb]->[11.2gb]/[14.5gb],
all_pools {[young] [1.2gb]-
>[146.1mb]/[1.2gb]}{[survivor]
[394.7mb]->[0b]/[438.8mb]}{[old]
[11.3gb]->[11gb]/[12.8gb]}
[2017-01-02 01:47:16,525][WARN ][monitor.jvm ]
[hostname] [gc][old][111127][1] duration [14.4s],
collections [1]/[15.2s], total [14.4s]/[14.4s],
memory [13.5gb]->[11.2gb]/[15gb], all_pools
{[young] [176mb]->[40mb]/[0b]}{[survivor]
[96mb]->[0b]/[0b]}{[old] [13.2gb]-
>[11.2gb]/[15gb]}
[2017-01-02 03:28:27,815][WARN ][monitor.jvm ]
[hostname] [gc][old][117128][2] duration [12.6s],
collections [1]/[13.5s], total [12.6s]/[27s],
memory [14.1gb]->[11gb]/[15gb], all_pools
{[young] [320mb]->[112mb]/[0b]}{[survivor]
[96mb]->[0b]/[0b]}{[old] [13.8gb]-
>[10.9gb]/[15gb]}
<cms gc>
stw occurred 1 time, 16.2s
<g1 gc>
stw occurred 2 times, 28.7s
STW with g1 gc took a longer time than cms gc

Circuit Breaker (1)
SELCT c, u, COUNT(*)
FROM monthly_idx // 수십억건짜리 Index
GROUP BY c, u
과도한 메모리 사용
GROUP BY with more than two high cardinality fields causes OOM
Full GC만 계속 발생
모든 질의에 대한 응답 없음 ES Full Start 방법 밖에 없음

Circuit Breaker (2)
• 전체 메모리의 2.5% 이상 사용 시, 수행 중인 Query가 Fail되지만,
• Cluster 전체가 먹통되는 현상 방지 가능
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.request.limit": "2.5%"
}
}

Index 휴지통 기능 (1)
• 사전 개념 - alias
daily_2017.01.01
(alias)
daily_2017.01.01_ver_1
(실제 index)
장점
Partial Data가 서비스 되는 것을 맊을 수 있음
(all or nothing)
입수 중
Client
조회 요청
Alias가 없으므로
조회되는 Data 없음

daily_2017.01.01
(alias)
daily_2017.01.01_ver_1
(실제 index)
Data가 온전히 입수 완료되었을 경우에만 alias 생성
입수 완료
Client
조회 요청
ver_1에 속한
Data가 전송

daily_2017.01.01
(alias)
daily_2017.01.01_ver_1
(실제 index)
Rollback도 가능
재입수
daily_2017.01.01_ver_2
(실제 index)
Client
조회 요청
ver_2에 속한
Data가 전송 입수 완료 후 alias 교체

daily_2017.01.01
(alias)
daily_2017.01.01_ver_1
(실제 index)
.Trash
(alias)
index 삭제 – Alias만 끊는다. Data 조회 안 됨
주기적으로 .Trash에 Alias 걸린 Index 삭제
Client

daily_2017.01.01
(alias)
daily_2017.01.01_ver_1
(실제 index)
.Trash
(alias)
실수로 삭제한 경우 Alias만 교체하면 됨
Client

{
"actions": [
{
"remove": {
"indices": ["daily_2017.01.01_ver1"],
"alias": "*"
}
},
{
"add": {
"indices": ["daily_2017.01.01_ver1"],
"alias": ".Trash"
}
}
}
DELETE /daily_2017.01.01_ver1

적절 Shard 개수, Size
Num of shards Docs per shard shard size Query 1 (sec) Qeury 2 (sec) Query 3 (sec)
5 4천만 17GB 0.6524 0.7728 0.8876
10 2천만 8.5GB 0.5328 0.5554 0.4526
20 1천만 4.2GB 0.8972 0.5044 0.5578
Shard Size별 Query 응답 시간 조사
문서 개수 2억개 기준
• Shard Size별 응답 시간이 크지 않음
• 저희는 Shard Size를 10GB 이내로 사용 중입니다
• Index 개수가 많지 않은 경우 Shard 개수는 (Core 개수 * 2)개 정도가 좋습니다

Reduce Disk Size
• Disabling _all field: 18.6% 감소
• Disabling _source field: 20% reduced
• Think before disabling the _source field

Logstash option for exactly-once (1)
Options for File input
• start_position => "beginning" for log rotate
• http://jason-heo.github.io/elasticsearch/2016/02/28/logstash-offset.html
Options for Kafka Output
• acks => "all"
• retries => n

access_log
stat_interval (1초)
discover_interval (15초)
log rotate 시점
(신규 파일 생성)
end인 경우 유실 발생
• stat_interval: 파일 갱신 여부 검사 주기
• discover_interval: pattern에 맞는 신규 파일 생성 여부 검사 주기
access_log
신규 파일 인지 시점

Broker 1
Leader
Broker 2
Follower 1
output
{
kafka {
...
compression_type => 'gzip'
acks => "all" # default:1
retries => 5 # defualt:0
}
}
Broker n
Follower m
ack
ack
The leader waits for all the acks sent by followers
Pros: Strongest available guarantee.
Cons: Slow
cf) acks=>"1" means that the leader will respond
without waiting the follower's ack
Option for the Kafka Output

Nested Document format (1)
[
{
"c": "blogger1",
"u": "url1",
"g": "m",
"a": "1",
"pv": 10"
},
{
"c": "blogger1",
"u": "url1",
"g": "f",
"a": "2",
"pv": 20"
}
]
[
{
"c": "blogger1",
"u": "url1",
"page_views": [
{
"g": "m",
"a": "1",
"pv": 10"
},
{
"g": "f",
"a": "2",
"pv": 20"
}
]
}
]
<Nested Doc><Flattened Doc>
• c: blogger id
• u: url
• g: gender
• a: age

sqlContext.sql("
SELECT c, u, g, a, COUNT(*) AS pv
FROM logs
GROUP BY c, u, g, a
").saveToEs("index_name/doc_type")
일반적인 저장 모델 - Flattened Doc Model
<입수 스크립트>
[
{
"c": "blogger1",
"u": "url1",
"g": "m",
"a": "1",
"pv": 10"
},
{
"c": "blogger1",
"u": "url1",
"g": "f",
"a": "2",
"pv": 20"
}
]
<문서 포맷>
Data 중복

case class PageView(g: String, a: String, pv:
Integer)
sqlContext.udf.register("page_view", (c: String, u:
String, pv: Integer) => PageView(c, u, pv))
sqlContext.sql("
SELECT c, u, COLLECT_LIST(page_view) AS page_views
FROM (
SELECT c, u, page_view(g, a, pv) AS page_view
FROM (
SELECT c, u, g, a, COUNT(*) AS pv
FROM logs
GROUP BY c, u, g, a
) t1
) t2
GROUP BY c, u
").saveToEs("index_name/doc_type")
Nested Doc Model
<입수 스크립트>
[
{
"c": "blogger1",
"u": "url1",
"page_views": [
{
"g": "m",
"a": "1",
"pv": 10"
},
{
"g": "f",
"a": "2",
"pv": 20"
}
]
}
]
중복 제거

• Pros
• Data size is 49% smaller than Flattened Model
• Bulk Loading time is 52% faster than Flattened Model (including
extra processing time)
• Cons
• Extra processing is required using SparkSQL
• But the bottleneck is saving the result to ES. Extra processing time is not a
problem
• ES gets slower when nested field has too many children
• So, use it when the number of children is small

{
"properties" : [
...
"c" : {
...
},
"type" : {
...
},
...
]
}
복합 필드 (1)
초기 Schema
질의 패턴
• c로도 조회: 5%
• type으로 조회: 3%
• 두 개 필드 AND 조회: 92%
위의 질의 패턴을 모두 지원해야 함
참고: ES에는 복합키 개념이 없다

{
"properties" : [
...
"c": {
...
},
"type": {
...
},
"ctype": {
...
}
]
}
복합 필드 (2)
c와 type을 조합한 1개 추가 생성
<schema>
{
"c": "blogger_id",
"type": "channel_pv",
"ctype": "blogger_id:channel_pv",
"pv": 10
}
<Document 예>

복합 필드 (3)
응답 속도 40% 개선 (Page Cache Miss 시)
{
"query_type": "BooleanQuery",
"lucene": "+c:blogger_id +type: channel_cv"
"time": "269.686223ms"
}
{
"query_type": "ConstantScoreQuery",
"lucene": "ConstantScore (ctype:c:blogger_id:channel_cv)",
"time": "124.790570ms"
}
<ES Query Profile 결과>

single doc의 일부 field 조회 개선 (1)
{
"query": {
"bool": {
"must": [ {
"term": {
"primary_key": "xx"
}
}]
}
},
"_source": {
"includes": ["pv"]
}
}
SELECT pv
FROM tab
WHERE primary_key = 'xx'
<DSL>
<SQL>
_source 필드에서 Data 조회

{
"query": {
...
...
},
"aggregations": {
"MAX(pv)": {
"max": {
"field": "pv"
}
}
}
}
SELECT MAX(pv)
FROM tab
WHERE primary_key = 'xx'
<DSL>
<SQL>
Doc Value에서 Data 조회
조회 문서가 1건이므로
pv = MAX(pv) = MIN(pv) = AVG(pv)

Query 조회 방식 처리량 (QPS) 평균 응답 시간 (ms)
Q1
_source 활용 4,604 107
Doc Value 활용 7,484 66
Q2
_source 활용 5,024 98
Doc Value 활용 7,595 65

_source Doc Value

• ES 5.x에는 Doc Value Fields라는 것이 생겼음
• 앞 장과 같은 용도로 사용되는 것인지는 테스트 못해 봤습니다 ㅠㅠ
GET /_search
{
"query" : {
"match_all": {}
},
"docvalue_fields" : ["test1", "test2"]
}

Segment Merge (1)
1 + 1 < 2
Segment 2개를 1개로 합치면 더 적은 Resource를 사용합니다

Segment Merge (2)
https://github.com/exo-archives/exo-es-search

Segment Merge (3)
• Lucene Memory: 36.8% 감소
• Index Size: 15% 감소
POST /index-name/_forcemerge/?max_num_segments=1

Segment Merge (5)
Segment Merge를 안 했다면…
꽉 차는 기간을 늘릴 뿐,
이 문제에 대한 완벽한 해결
책은 아님

Segment Merge (6)
주의: 간혹 Heap이 오히려 증가하는 경우도 있습니다

Q. 엘라스틱서치와 스파크의 연동작업 중 주의해야할 사항 또는 같이 사용
했을 때의 시너지 효과에 대해 묻고 싶습니다
A.
WRITE 관점 관점 READ 관점
• 입수가 편하다
• dataframe을 saveToEs()만
호출하면 자동 입수
• 에러 처리를 es-hadoop이 다 해줌
• 다양한 옵션들
• 입수 진행율을 Spark Job 모니터링
을 통해서 쉽게 알 수 있다
• 편하다
• 다양한 Data Source와 JOIN 가능
• Index Backup이 쉽다
• filter push down
주의 사항
Write 관점: Spark worker 개수를 늘려도 어느 임계점 이후부터는 CPU 사용량만 많아질 뿐
indexing rate는 동일
Read 관점: Shard 개수와 worker 개수를 맞추는 것이 좋음

Web Analytics at Scale with Elasticsearch @ naver.com - Part 2 - Lessons Learned

Recommended

More Related Content

What's hot (20)

Similar to Web Analytics at Scale with Elasticsearch @ naver.com - Part 2 - Lessons Learned (20)

Web Analytics at Scale with Elasticsearch @ naver.com - Part 2 - Lessons Learned