0% found this document useful (0 votes)
31 views

Terms Splunk

Uploaded by

scribbd.8r79d
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Terms Splunk

Uploaded by

scribbd.8r79d
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

© 2020 SPLUNK INC.

© 2020 SPLUNK INC.

TSTATS and
PREFIX
How to get the most out of your lexicon,
with walklex, tstats, indexed fields,
PREFIX, TERM and CASE

Richard Morgan
Principal Architect | Splunk
© 2020 SPLUNK INC.

Richard Morgan
Principal Architect – Splunk
Forward- During the course of this presentation, we may make forward‐looking statements regarding
future events or plans of the company. We caution you that such statements reflect our

Looking current expectations and estimates based on factors currently known to us and that actual
events or results may differ materially. The forward-looking statements made in the this

Statements presentation are being made as of the time and date of its live presentation. If reviewed after
its live presentation, it may not contain current or accurate information. We do not assume
any obligation to update any forward‐looking statements made herein.

In addition, any information about our roadmap outlines our general product direction and is
subject to change at any time without notice. It is for informational purposes only, and shall
not be incorporated into any contract or other commitment. Splunk undertakes no obligation
either to develop the features or functionalities described or to include any such feature or
functionality in a future release.

Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States
and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved
© 2020 SPLUNK INC.

Averaging one slide very 45s

=
=
© 2020 SPLUNK INC.

The Key to Productivity Is


Work Avoidance
i.e. don’t do work you don’t have to do
© 2020 SPLUNK INC.

Search Performance Underpins Everything


Search load is the biggest factor in sizing Splunk (not ingestion)

Faster loading dashboards Reduces the need for precomputation


• User experience is improved with faster completing (summaries)
searches • Summaries should be used to reduce load, not
• User productivity improves as run \ test cycles are accelerate slow searches
accelerated

Better performance enables more use Reduced server load


cases • Support more users on less hardware
• Improvements of x10 and x100 allow users to attack
new problems • Improves ROI on hardware investment

• Examine weeks and months of data, instead of just


hours and minutes
© 2020 SPLUNK INC.

Search Performance Underpins Everything


Search load is the biggest factor in sizing Splunk (not ingestion)

Faster loading dashboards Reduces the need for precomputation


• User experience is improved with faster completing (summaries)
searches • Summaries should be used to reduce load, not
• User productivity improves as run \ test cycles are accelerate slow searches

Reduction in HW costs
accelerated

Better performance enables more use Reduced server load


cases • Support more users on less hardware
• Improvements of x10 and x100 allow users to attack
new problems • Improves ROI on hardware investment

• Examine weeks and months of data, instead of just


hours and minutes
© 2020 SPLUNK INC.

Minimize Work: Select indexes


index=search_demo* selects directories starting with search_demo
(base) rmorgan-mbp-4cb4b:splunk rmorgan$ ls -al
total 824
drwx------ 252 rmorgan wheel 8064 31 Aug 11:33 .
drwx--x--- 4 rmorgan wheel 128 24 Nov 2019 .. When we specify indexes in our
-rw-r--r--@ 1 rmorgan wheel 12292 22 Jul 11:09 .DS_Store
-rw------- 1 rmorgan wheel
-rw------- 1 rmorgan wheel
0 31 Aug 11:10 .dirty_database
3 31 Aug 11:10 _audit.dat
search we are narrowing the
-rw------- 1 rmorgan wheel 3 31 Aug 11:10 _internal.dat
drwx------ 7 rmorgan wheel 224 24 Nov 2019 _internaldb
directories we wish to access.
drwx------ 6 rmorgan wheel 192 18 Oct 2019 _introspection
-rw------- 1 rmorgan wheel 3 31 Aug 11:11 _introspection.dat This is the highest level of exclusion in
drwx------ 6 rmorgan wheel 192 20 Nov 2019 _metrics
-rw------- 1 rmorgan wheel 3 31 Aug 11:10 _metrics.dat
drwx------ 6 rmorgan wheel 192 22 Jul 10:34 _metrics_rollup
Splunk and it is minimal requirement
drwx------ 6 rmorgan wheel 192 18 Oct 2019 _telemetry
-rw------- 1 rmorgan wheel 2 31 Aug 11:20 _telemetry.dat
for high performance search.
index=* selects all indexes, expect
drwx------ 6 rmorgan wheel 192 18 Oct 2019 audit
drwx------ 2 rmorgan wheel 64 18 Oct 2019 authDb
drwx------ 6 rmorgan wheel 192 30 Aug 13:34 defaultdb
drwx------ 9 rmorgan wheel 288 31 Aug 11:42 fishbucket
drwx------ 2 rmorgan wheel 64 18 Oct 2019 hashDb
for those that start with an underscore
drwx------ 6 rmorgan wheel 192 31 Aug 11:10 search_demo_1
-rw------- 1 rmorgan wheel 2 31 Aug 11:30 search_demo_1.dat
(_internal, _audit etc)
drwx------ 6 rmorgan wheel 192 31 Aug 11:10 search_demo_2
-rw------- 1 rmorgan wheel 2 31 Aug 11:33 search_demo_2.dat
drwx------ 6 rmorgan wheel 192 18 Oct 2019 summarydb
© 2020 SPLUNK INC.

Minimize Work: Select a timerange


Applying the filter earliest=-20d latest=-10d selects buckets to consider
(base) rmorgan-mbp-4cb4b:splunk rmorgan$ ls -al search_demo/db/
total 16
drwx------ 25 rmorgan wheel 800 30 Aug 21:04 .
drwx------ 6 rmorgan wheel 192 30 Aug 19:52 ..
Each bucket encodes the time
-rw------- 1 rmorgan wheel 2904 30 Aug 20:14 .bucketManifest
-rw------- 1 rmorgan wheel 10 30 Aug 19:52 CreationTime range for the data it holds in
drwx--x--- 2 rmorgan wheel 64 30 Aug 19:52 GlobalMetaData
drwx--x--- 16 rmorgan wheel 512 30 Aug 19:54 db_1598984915_1598812143_60 EPOC time.
drwx--x--- 16 rmorgan wheel 512 30 Aug 19:55 db_1598984915_1598984915_61
drwx--x--- 16 rmorgan wheel 512 30 Aug 19:56 db_1598984916_1598984915_62
drwx--x--- 15 rmorgan wheel 480 30 Aug 19:57 db_1598984916_1598984916_63
Therefore when we only consider
drwx--x--- 16 rmorgan wheel 512 30 Aug 19:59 db_1598984917_1598984916_64
drwx--x--- 17 rmorgan wheel 544 30 Aug 20:00 db_1598984917_1598984917_65 bucket that have timestamps that
drwx--x--- 15 rmorgan wheel 480 30 Aug 20:01 db_1598984918_1598984917_66
drwx--x--- 16 rmorgan wheel 512 30 Aug 20:02 db_1598984918_1598984918_67 fall into the time range we have
specified.
drwx--x--- 16 rmorgan wheel 512 30 Aug 20:03 db_1598984919_1598984918_68
drwx--x--- 14 rmorgan wheel 448 30 Aug 20:04 db_1598984919_1598984919_69
drwx--x--- 15 rmorgan wheel 480 30 Aug 20:05 db_1598984920_1598984919_70
drwx--x--- 16 rmorgan wheel 512 30 Aug 20:06 db_1598984920_1598984920_71
drwx--x--- 13 rmorgan wheel 416 30 Aug 20:07 db_1598984920_1598984920_72 Use “dbinpect” allows you
drwx--x--- 15 rmorgan wheel 480 30 Aug 20:08 db_1598984921_1598984920_73
drwx--x--- 13 rmorgan wheel 416 30 Aug 20:09 db_1598984921_1598984921_74 understand this selection process
without executing a full search.
drwx--x--- 14 rmorgan wheel 448 30 Aug 20:10 db_1598984922_1598984921_75
drwx--x--- 16 rmorgan wheel 512 30 Aug 20:11 db_1598984922_1598984922_76
drwx--x--- 17 rmorgan wheel 544 30 Aug 20:12 db_1598984923_1598984922_77
drwx--x--- 16 rmorgan wheel 512 30 Aug 20:13 db_1598984923_1598984923_78
drwx--x--- 13 rmorgan wheel 416 30 Aug 21:08 hot_v1_79
© 2020 SPLUNK INC.

Output: A list of Buckets to consider


Time range + indexes selects buckets that must be processed

considered_buckets
© 2020 SPLUNK INC.

Six stages of indexer search processing


buckets bucket events The first line of your search
buckets bucket events
typically represents the greatest
buckets bucket events amount of computational effort
required to execute your search.
Making efficient use of the first
1. Index and time ranges defines
the considered buckets
3. LISPY queries the tsidx to
identify slices to decompress
5. Schema on the fly extracts
and eliminates events line in your search results the
greatest gains and everything
else barely matters
buckets slices events
buckets slices events

buckets slices events index=<indexes> <constraints>

| <everything else>

2. Metadata and bloom 4. Extracts and parse 6. Events are processed by


filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

Horror
Scan Count Vs. Event Count
During execution you see the ratio between scan count and event count
😱😱 Show

False positive ratio


7 / 312,792
= 99.99%
false matches
Try and eliminate events BEFORE they are
extracted from the raw data, as this avoids the 21 seconds
CPU intensive decompression and parsing to execute
© 2020 SPLUNK INC.

Early Elimination Improves Performance


By introducing the TERM parameter into our search we have eliminated all false positives

😍😍 🥰🥰
scan_count = event_count
TERM is used in less the 1% of all customer searches executed on Splunk Cloud
© 2020 SPLUNK INC.

The difference is in the LISPY


By introducing TERM we made changed the LISPY to be more precise

BEFORE: LISPY is the search language that


we use to search the lexicon.
• SPL = index=* average=0.9*
• LISPY = [ AND 0 9* index::* ] The first search looks for any event
that includes all the of the minor
terms 0 9* in any index.
AFTER: The second looks for any major
• SPL = index=* TERM(average=0.9*) term that starts “average=0.9*”
• LISPY = [ AND average=0.9* index::* ] in any index.

MAJOR TERM
© 2020 SPLUNK INC.

Where to MAJOR TERMS Come From?


Splunk uses a Universal indexing algorithm to tokenize events and write to index

Splunk has a two-stage Major breakers


parsing process
[ ] < > ( ) { } | ! ; , ' " * \n \r
Firstly, we break up _raw \s \t & ? + %21 %26 %2526 %3B %7C
with major breakers
%20 %2B %3D -- %2520 %5D %5B %3A
Secondly, we apply minor %0A %2C %28 %29
breakers to the major
breakers
Minor breakers
This is configurable in
limits.conf (beware / : = @ . - $ % \\ _
changing!!!)
© 2020 SPLUNK INC.

Step 1 – Applying MAJOR Breakers


How Splunk takes a log line and creates TERMS with major breakers

Input string (_raw):


01-27-2020 20:29:22.922 +0000 INFO Metrics - group=per_sourcetype_thruput,
ingest_pipe=0, series="splunkd", kbps=258.6201534528208, eps=1367.6474892210738,
kb=8050.1142578125, ev=42571, avg_age=145747.7853938127, max_age=2116525

Output MAJOR TERM list:


["01-27-2020", "20:29:22.922", "+0000", "info", "metrics", "-",
"group=per_sourcetype_thruput", "ingest_pipe=0", "series=", "splunkd",
"kbps=258.6201534528208", "eps=1367.6474892210738", "kb=8050.1142578125,", "ev=42571",
"avg_age=145747.7853938127", "max_age=2116525”]

Notice how all fields other than series= are tokenized into useful TERMS
© 2020 SPLUNK INC.

Step 2 – Applying MINOR Breakers


How Splunk takes a log line and creates TERMS with minor breakers

Input array (MAJOR BREAKERS):


• ["01-27-2020", "20:41:20.355", "+0000", "info", ”metrics", "-", These terms are only
"group=per_sourcetype_thruput", "ingest_pipe=0", "series=", "top", accessible with the
"kbps=23.83452969239664", "eps=155.64262209891208",
"kb=743.4765625", "ev=4855", "avg_age=145747.7853938127", TERM keyword
"max_age=2116525"]

Output TERMS (MINOR BREAKERS):


• ["0000", ", "thruput","0", "01", "155", "20", "2020", "23", "27", "355",
"41", "4765625", "4855", "64262209891208", "743", These terms are used
"83452969239664", ”info", "metrics", "age", "avg", "eps", "ev",
"group", "ingest", "kb", "kbps", "max", "per", "pipe", "series",
for _raw search
"sourcetype”]

SIDE NOTE: Over precision in numbers generates many unique TERMS and bloats the tsidx file
© 2020 SPLUNK INC.

Eyeballing a log for MAJOR TERMS


Identifying and testing for MAJOR TERMS in your events is easy

Input event
01-27-2020 20:29:22.922 +0000 INFO Metrics - group=per_sourcetype_thruput,
ingest_pipe=0, series="splunkd", kbps=258.6201534528208, eps=1367.6474892210738,
kb=8050.1142578125, ev=42571, avg_age=145747.7853938127, max_age=2116525

Output token list MINOR + MAJOR


["01-27-2020", "20:41:20.355", "+0000", "info", ”metrics", "-",
"group=per_sourcetype_thruput", "ingest_pipe=0", "series=", "top",
"kbps=23.83452969239664", "eps=155.64262209891208", "kb=743.4765625", "ev=4855",
"avg_age=145747.78539381270", "max_age=2116525”, "0000", ", "thruput","0", "01", "155",
"20", "2020", "23", "27", "355", "41", "4765625", "4855", "64262209891208", "743",
"83452969239664", ”info", "metrics", "age", "avg", "eps", "ev", "group", "ingest", "kb",
"kbps", "max", "per", "pipe", "series", "sourcetype”]
© 2020 SPLUNK INC.

Let’s Update Our Example


Thanks to major breakers we have additional terms in our index

TERM Events with TERM


1. Tom. Rich and Harry
tom 1
2. Bob loves Fred tom. 1
rich 1,4
Universal indexing
3. Fred loves Susan harry 1, 4
Major + minor breakers
susan 3, 5, 6
4. Harry loves Rich
bob 2
5. Karen loves Susan fred 2,3
karen 5, 6
6. Loves. Susan Karen
loves 2,3,4,5,6
loves. 6
© 2020 SPLUNK INC.

Search for Exact Match “Karen Loves Susan”


LISPY search = [ AND karen loves susan ]

TSIDX journal
TERM Events containing The posting lists tells us
TERM
7 that we have two slices
tom 1
6 that contain all the terms
tom. 1
rich 1,4
we need.
5
harry 1, 4 We extract these slices
susan 3, 5, 6 4 from the bucket,
bob 2 3 decompress and run
fred 2,3
2
though schema on the fly
karen 5,6 to see if they match.
loves 2, 3, 4, 5, 6 1
loves. 6
© 2020 SPLUNK INC.

Karen Loves Susan NOT TERM(loves.)


LISPY search = [ AND karen loves susan [ NOT loves. ] ]

TSIDX journal
TERM Events with TERM But excluding ”loves.” (with
the comma) we have stopped
tom 1 7
the need to open and parse
tom. 1
6 slice 6.
rich 1,4
5 This means only a single
harry 1, 4
event is parsed onto index on
susan 3, 5, 6 4 the fly.
bob 2
3
fred 2,3 The false positive ratio
is now 0% - doubling
🥳🥳
karen 5,6 2
performance
loves 2, 3, 4, 5, 6 1
loves. 6
© 2020 SPLUNK INC.

“walklex” Lets to You Inspect the Lexicon


We can see INDEXED FIELDS when type=fieldvalue
© 2020 SPLUNK INC.

“walklex” Lets to You Inspect the Lexicon


We can see TERMS when type=term
© 2020 SPLUNK INC.

Splunk Has Two Major Search Options


_raw search has the most versatility, but advanced users use tstats

Fast versatile
© 2020 SPLUNK INC.

Example: Splunk’s Hostwide Metrics (-31d)


Hostwide metrics uses “INDEXED_JSON” and can be queried both ways

Raw search
134 secs

Equivalent tstats search


3 secs
Improvement is x39 faster for the same result set
The Need for Indexed Fields Limits tstats
© 2020 SPLUNK INC.

Adoption
The prerequisite of indexed fields means its application is limited

It is difficult to discover the existence of


indexed fields when available
• The walklex function introduced in 7.3 helps
• The existence of TERMS can be inferred from
raw search log data

Although barely undocumented


tstats supports the TERM() directive
tstats

Few searches can be converted to tstats


© 2020 SPLUNK INC.

Indexed Field Creation


There are various ways to get indexed fields into Splunk
At ingestion we can extract metadata from raw Some structured data sources can optionally
event and create indexed fields create indexed fields automatically
• Uses props and transforms, normally via REGEX, • INDEXED_EXTRACTIONS works with CSV and JSON data
sometimes INGEST_EVAL
• This can bloat the TSIDX file, and is frequently disabled
• This is discouraged in favor of search time extractions

Post ingestion we use an create a datamodel


HTTP Event collector has a “fields” section • Data models are based entirely on indexed fields, no raw
• Slightly dangerous as clients define indexed events, just TSIDX files
fields and can bloat TSIDX • Building the data model requires a raw search, this can hide
the true cost
© 2020 SPLUNK INC.

How to Get the Most From Indexed Fields


If review complex pipeline configurations is your bag, you’ll love this talk!
© 2020 SPLUNK INC.

tstats Supports TERM


This is a log line from ITSI, lots of useful TERMS in here

09/24/2020 09:26:00 +0000, search_name="Indicator - Shared -


5dd8512622092b554f3e7da7 - ITSI Search", search_now=1600939560.000,
info_min_time=1600939260.000, info_max_time=1600939560.000,
info_search_time=1600939594.150, qf="", kpi="Average Alert Severity",
kpiid="ec77165d-e79f-4379-9534-3479954e64a6", urgency=5, serviceid="9a6bdac6-
fa6c-423e-81dc-785dbf75637e", itsi_service_id="9a6bdac6-fa6c-423e-81dc-
785dbf75637e", is_service_aggregate=1, is_entity_in_maintenance=0,
is_entity_defined=0, entity_key=service_aggregate, is_service_in_maintenance=0,
kpibasesearch=5dd8512622092b554f3e7da7, is_filled_gap_event=0,
alert_color="#F26A35", alert_level=5, alert_value=5, itsi_kpi_id="ec77165d-e79f-
4379-9534-3479954e64a6", is_service_max_severity_event=1, alert_severity=high,
alert_period=1, entity_title=service_aggregate, hostname="https://itsi-
search.customer.com:443"

We can use TERM on any of the tokens highlighted in yellow, but notice the one in RED
© 2020 SPLUNK INC.

tstats Supports TERM tstats

🚀🚀 48x
Some simple searches can be expressed with TERM
version is
index=itsi_summary TERM(alert_severity=*) faster
| timechart span=1sec count by alert_severity

| tstats prestats=t count where index=itsi_summary TERM(alert_severity=high) by _time span=1sec


| fillnull "high" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=low) by _time span=1sec
| fillnull "low" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=medium) by _time span=1sec
| fillnull "medium" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=normal) by _time span=1sec
| fillnull "normal" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=unknown) by _time span=1sec
| fillnull "unknown" alert_severity
| timechart limit=50 span=1sec count by alert_severity
© 2020 SPLUNK INC.

PREFIX Directive Added to tstats In v8


With PREFIX indexed fields are not longer a prerequisite for tstats

The extension massively increases the instances


where tstats can be used
Search PREFIX allows TERMS to be processed as if they
were indexed fields, for example:
• Indexed field search: | tstats count by host
• TERM search: | tstats count by PREFIX(host=)
tstats

PREFIX is also supported in aggregators:


• Indexed field search: | tstats sum(PREFIX(value=))
With PREFIX many more searches
can be converted to tstats in v8
© 2020 SPLUNK INC.

tstats Supports PREFIX()


PREFIX greatly simplifies our search

| tstats prestats=t count where index=itsi_summary TERM(alert_severity=high) by _time span=1sec


| fillnull "high" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=low) by _time span=1sec
| fillnull "low" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=medium) by _time span=1sec
| fillnull "medium" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=normal) by _time span=1sec
| fillnull "normal" alert_severity
| tstats prestats=t append=t count where index=itsi_summary TERM(alert_severity=unknown) by _time span=1sec
| fillnull "unknown" alert_severity
| timechart limit=50 span=1sec count by alert_severity

🚀🚀
prefix version is
🚀🚀
| tstats count where index=itsi_summary TERM(alert_severity=*)
by PREFIX(alert_severity=) _time span=1sec
| rename alert_severity= as alert_severity
| xyseries _time alert_severity count 3x faster again!
© 2020 SPLUNK INC.

Q. What is the ingestion over 24 hours?


Every host generates metrics about its ingestion throughput very 30 seconds

01-21-2020 12:25:44.311 +0000 INFO Metrics - group=thruput,


ingest_pipe=1, name=thruput, instantaneous_kbps=3.366894499322308,
instantaneous_eps=12.163696322058637, average_kbps=47.777961955016565,
total_k_processed=31355244, kb=104.6298828125, ev=378, load_average=2.42

Load average = how hard the server is working


Kb = the data processed since the last reading
Instantaneous_kbps = the ingestion rate at point of measurement
Pipeline = the ingestion pipeline the reading is from
© 2020 SPLUNK INC.

Search conversion raw -> tstats


This search demonstrated a 10x performance improvement over 24 hours

Raw search PREFIX search

index=_internal host IN (idx*) group=thruput name=thruput | tstats


| bin span=1767s _time sum(PREFIX(kb=)) as indexer_kb
| stats avg(PREFIX(instantaneous_kbps=)) as instantaneous_kbps
sum(kb) as indexer_kb avg(PREFIX(load_average=)) as load_avg
avg(instantaneous_kbps) as instantaneous_kbps where
avg(load_average) as load_avg host IN (idx*) index=_internal
by host _time host=idx* TERM(group=thruput) TERM(name=thruput)
by host _time span=1767s

prefix version is
30x faster!
© 2020 SPLUNK INC.

How did cachemgr behave over 24 hours?


Metrics.log group=cachemgr_bucket

09-21-2020 12:10:41.051 +0000 INFO Metrics - group=cachemgr_bucket, open=4557, close=4561, cache_hit=4557, open_buckets=4

09-21-2020 12:10:44.330 +0000 INFO Metrics - group=cachemgr_bucket, open=3550, close=3550, cache_hit=3550, open_buckets=4

09-21-2020 12:10:39.985 +0000 INFO Metrics - group=cachemgr_bucket, open=3412, close=3415, cache_hit=3412, open_buckets=4

09-21-2020 12:10:44.102 +0000 INFO Metrics - group=cachemgr_bucket, register_start=1, open=4096, close=4100, cache_hit=4096, open_buckets=6

09-21-2020 12:10:45.709 +0000 INFO Metrics - group=cachemgr_bucket, register_start=1, register_end=1, open=3162, close=3164, cache_hit=3162, open_buckets=5

09-21-2020 12:10:41.229 +0000 INFO Metrics - group=cachemgr_bucket, register_cancel=1, open=4794, close=4796, cache_hit=4794, open_buckets=7

09-21-2020 12:10:10.012 +0000 INFO Metrics - group=cachemgr_bucket, open=4783, close=4779, cache_hit=4783, open_buckets=8

09-21-2020 12:10:23.227 +0000 INFO Metrics - group=cachemgr_bucket, register_start=1, open=2896, close=2896, cache_hit=2896, open_buckets=4
© 2020 SPLUNK INC.

Search conversion raw -> tstats


How did cache behave over 24 hours?

Raw search PREFIX search

index=_internal host IN (idx*) TERM(group=cachemgr_bucket) | tstats


| bin span=1798s _time sum(PREFIX(absent_summary_skipped=)) as absent_summary_skipped
| stats sum(PREFIX(bootstrap_summary=)) as bootstrap_summary
sum(absent_summary_skipped) as absent_summary_skipped sum(PREFIX(cache_hit=)) as cache_hit
sum(PREFIX(cache_miss=)) as cache_miss
sum(bootstrap_summary) as bootstrap_summary
sum(PREFIX(close=)) as close
sum(cache_hit) as cache_hit
sum(PREFIX(close_all=)) as close_all
sum(cache_miss) as cache_miss where
sum(close) as close index=_internal host IN (idx*) TERM(group=cachemgr_bucket)
sum(close_all) as close_all by host _time span=1798s
by host _time
prefix version is
25x faster!
© 2020 SPLUNK INC.

Other segmenters.conf Options


You can disable major breakers per sourcetype by indexing with “search”

[full]

[indexing]
# change INTERMEDIATE_MAJORS to "true" if you want an ip address to appear in typeahead as a, a.b, a.b.c, a.b.c.d
# the typical performance hit by setting to "true" is 30%
INTERMEDIATE_MAJORS = false

[search]
MAJOR = [ ] < > ( ) { } | ! ; , ' " \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520 %5D %5B %3A %0A %2C %28 %29 / : = @ . - $ # % \\ _
MINOR =

[standard]
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t / : = @ . ? - & $ # + % _ \\ %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520
MINOR =

[inner]
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t / : = @ . ? - & $ # + % _ \\ %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520
MINOR =

[outer]
MAJOR = [ ] < > ( ) { } | ! ; , ' " * \n \r \s \t & ? + %21 %26 %2526 %3B %7C %20 %2B %3D -- %2520
MINOR =
© 2020 SPLUNK INC.

Testing Segmentation Options on splunkd.log


Major breakers are very expensive on storage if you don’t use them
Removing all major
breakers drops bucket size
by 20%

Using regex to extract all


attribute value pairs,
including quoted strings
increased the size of the
search segmentation by
50%

Switching from default to


regex extraction caused an
increase of 18%
© 2020 SPLUNK INC.

Work Avoidance – Loadjob


You can execute a search in one location and then use the results it in another

10 minutes to run 10 seconds to run

When developing complex searches on large data sets, avoid repeatedly reloading event
data from indexers as you iterate towards your solution
© 2020 SPLUNK INC.

Work Avoidance – Dashboard Base Searches


Run base searches once, use child searches to modify the base data set
<form>
<search id="run_once">
<query> Base search contains no tokens, it remains static
index="search_demo_2" label average sum
| timechart sum(sum) avg(sum)
</query>
</search> The child search contains the token and is
<search base="run_once"> reevaluated whenever it is updates
<query>
| table _time $show_field$
</query>
</search>
<fieldset>
<input type="dropdown" token="show_field"> The user can modify the $show_field$
<label>show field</label> token without causing the base search to
<choice value="avg(sum)">avg</choice>
execute
<choice value="sum(sum)">sum</choice>
</input>
</fieldset>
</form>
This is how you build is a high-performance interactive dashboards
© 2020 SPLUNK INC.

Free performance boost! 1/2


Make your buckets smaller and your searches go slightly faster by updating the config

Use this one!


journalCompression = gzip|lz4|zstd
* The compression algorithm that splunkd should use for the rawdata journal
file of new index buckets.
* This setting does not have any effect on already created buckets. There is
no problem searching buckets that are compressed with different algorithms.
* "zstd" is only supported in Splunk Enterprise version 7.2.x and higher. Do
not enable that compression format if you have an indexer cluster where some
indexers run an earlier version of Splunk Enterprise.
* Default: gzip

We have been improving the compression on


buckets, have you updated your configurations yet?
© 2020 SPLUNK INC.

Free performance boost 2/2


The TSIDX files are normally bigger than the journal, so use latest compression
Use level 3!
tsidxWritingLevel = [1|2|3]
* Enables various performance and space-saving improvements for tsidx files.
* For deployments that do not have multi-site index clustering enabled,
set this to the highest value possible for all your indexes.
* For deployments that have multi-site index clustering, only set
this to the highest level possible AFTER all your indexers in the
Who doesn’t
cluster have been upgraded to the latest code level.
want this for
free?
* Do not configure indexers with different values for 'tsidxWritingLevel'
as downlevel indexers cannot read tsidx files created from uplevel
peers.
* The higher settings take advantage of newer tsidx file formats for
metrics and log events that decrease storage cost and increase
performance
* Default: 1
© 2020 SPLUNK INC.

Everybody Gets a Dashboard


https://github.com/silkyrich/cluster_health_tools/blob/master/default/data/ui/views/search_
performance_evaluator.xml
© 2020 SPLUNK INC.

Enter your search here

Make
faster!

SPL to LISPY
Make smaller

Bigger is better!
© 2020 SPLUNK INC.

Please provide feedback via the

SESSION SURVEY
© 2020 SPLUNK INC.

1. Index and time defines considered buckets

buckets
buckets
bucket
bucket
events
events
All searches are executed
with an index and a time
buckets bucket events
range. This defines our list
of buckets to consider.
1. Index and time defines
the considered buckets
3. LISPY queries the tsidx to
identify slices to decompress
5. Schema on the fly extracts
and eliminates events The first performance tip is
to make this as tight as
buckets slices events possible.
buckets slices events

Minimize indexes and


buckets slices events
narrow the time range

2. Metadata and bloom 4. Extracts and parse 6. Events are processed by


filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

Why is performance so bad?


buckets bucket events
buckets bucket events
When the scan count is high and
buckets bucket events
the event count is low we are
filtering events during schema on
the fly.
1. Index and time defines
the considered buckets
3. LISPY queries the tsidx to
identify slices to decompress
5. Schema on the fly extracts
and eliminates events
This is the most expensive place
to filter as we have downloaded
buckets, open the tsidx,
buckets
buckets
slices
slices
events
events
extracted slices fully parsed
events.
buckets slices events
Minimize filtering during schema
on the fly
2. Metadata and bloom 4. Extracts and parse 6. Events are processed by
filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

What happened?
buckets bucket events
buckets bucket events
By introducing TERM to our
buckets bucket events
search we were able to improve
elimination earlier in the pipeline.
Doing so saves downloading
1. Index and time defines
the considered buckets
3. LISPY queries the tsidx to
identify slices to decompress
5. Schema on the fly extracts
and eliminates events
journal files from SmartStore, and
reduces CPU required for
decompression and parsing
buckets slices events
slices
buckets events
Minimize filtering during schema
buckets slices events
on the fly stage

2. Metadata and bloom 4. Extracts and parse 6. Events are processed by


filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

Processing the considered buckets


buckets bucket events
buckets bucket events
After we have selected our range
buckets bucket events
of buckets to search we must find
and extract the data from them to
do so.
1. Index and time defines
the considered buckets
3. LISPY queries the tsidx to
identify slices to decompress
5. Schema on the fly extracts
and eliminates events
Where the filtering is performed
can have a dramatic impact to
search performance.
buckets slices events
buckets slices events

buckets slices events

2. Metadata and bloom 4. Extracts and parse 6. Events are processed by


filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

Agenda 1. Introduction
What this presentation is all about

2. Search and workload elimination


How search works and where time is spent

3. How the index is built


How universal indexing builds the lexicon

4. Bloomfilter elimination
How bloomfilters accelerate _raw search

5. Advanced indexing with Major breaker


How major breakers and turbo charge elimination

6. Introducing tstats
How tstats delivers further performance improvements

7. Other tricks and a performance dashboard


loadjob, base searches and take away dashboard
© 2020 SPLUNK INC.

Explaining TSIDX and the Lexicon


Universal indexing breaks down the log lines and extracts the tokens to build a map

1. Tom. Rich and Harry TERM Events with TERM


tom 1
2. Bob loves Fred
rich 1,4
Universal indexing
3. Fred loves Susan
Minor breakers
harry 1, 4
4. Harry loves Rich susan 3, 5, 6
bob 2
5. Karen loves Susan
fred 2,3
6. Loves. Susan Karen karen 5, 6
The lexicon is composed of
lowercase TERMS
loves 2,3,4,5,6
© 2020 SPLUNK INC.

“Karen Loves Susan” matched two events


We have extracted two slices, scanned two events and returned one event

6 “Karen loves
decompress Susan”

“Loves. Susan FALSE


5 POSITIVE
decompress Karen”

scan_count=2, event_count=1
Implies a 50% event elimination during schema on the fly
© 2020 SPLUNK INC.

Bloomfilters and metadata eliminate buckets


buckets bucket events Buckets that are eliminated to
buckets bucket events
not have to be further
buckets bucket events processed, plus we don’t need
to download tsidx or the journal
Dependent on search, data and
1. Index and time defines
the considered buckets
3. LISPY queries the tsidx to
identify slices to decompress
5. Schema on the fly extracts
and eliminates events event distribution Splunk can
eliminate up to 99% of buckets.
buckets slices events Second performance tip
buckets slices events
maximize elimination
buckets slices events
Use host, source and
sourcetype plus spare terms to
help bucket elimination.
2. Metadata and bloom 4. Extracts and parse 6. Events are processed by
filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

Agenda 1. Introduction
What this presentation is all about

2. Search and workload elimination


How search works and where time is spent

3. How the index is built


How universal indexing builds the lexicon

4. Bloomfilter elimination
How bloomfilters accelerate _raw search

5. Advanced indexing with Major breaker


How major breakers and turbo charge elimination

6. Introducing tstats
How tstats delivers further performance improvements

7. Other tricks and a performance dashboard


loadjob, base searches and take away dashboard
© 2020 SPLUNK INC.

How Bloom Filters Eliminate Whole Buckets


Credit to the interactive tool:
Bloom filters are a useful acceleration technology
for evaluating set membership.

They are able to 100% accuracy in testing for the


existence of terms, but less so for the absence.

The likelihood of false positives decreases as the


size of the array is increased.

In the example we have loaded in the terms from


our example lexicon and how they are translated
to setting bits in the array.
The list of terms The output bit Splunk auto tunes the size of the bloom filter to
held in the map for the list maintain a good balance between size and
lexicon of TERMS accuracy (often above 99%).

https://www.jasondavies.com/bloomfilter/
© 2020 SPLUNK INC.

Looking Up Non-existent Terms


A positive false, and a false negative

This is an example of This is an example of


the bloom correctly the bloom filter clash.
assessing an absence We need the bloom filter
test. to be larger

We don’t need to open We need to open the


the tsidx file, the term is tsidx file and check the
definitely not there lexicon to see if it is
really there.
© 2020 SPLUNK INC.

The Bucket \ Journal is Composed of Slices


The Postings list maps TERMS to locations into its associated bucket
TSIDX
journal
TERM Slices containing The TSIDX file maps TERMS
TERM 7 found in the lexicon to slices to
tom 1 decompress in the journal file.
6 Given these locations we can
rich 1,4
5 decompress the slices required
harry 1, 4 and inspect the _raw string.
susan 3, 5, 6 4 Note that the need to support
bob 2 slices is the reason bucket
3 compression can use lz4, zstd
fred 2,3 and gzip, but will never support
2
karen 5, 6
1
loves 2,3,4,5,6
© 2020 SPLUNK INC.

A “Bucket” is a Directory
A bucket is a collection of files held in a directory structure; notable files highlighted

(base) rmorgan-mbp-4cb4b:splunk rmorgan$ ls -al search_demo/db/db_1596632603_1596618900_87/


total 17936 TSIDX files that point
drwx--x--- 16 rmorgan wheel
drwx------ 8 rmorgan wheel
512 7 Aug 20:03 .
256 28 Aug 10:14 ..
TERMS into slices
-rw------- 1 rmorgan wheel 8 7 Aug 20:03 .rawSize found in the journal
-rw------- 1 rmorgan wheel 7 7 Aug 20:03 .sizeManifest4.1
-rw------- 1 rmorgan wheel 503929 7 Aug 20:03 1596620994-1596618900-4712026901567338950.tsidx
-rw------- 1 rmorgan wheel 3727073 7 Aug 20:02 1596632603-1596620225-4538014197027015779.tsidx
-rw-------
-rw-------
1 rmorgan wheel
1 rmorgan wheel
57894 7 Aug 20:02 Hosts.data
118 7 Aug 20:02 SourceTypes.data
A list of the hosts,
-rw------- 1 rmorgan wheel 669 7 Aug 20:02 Sources.data sourcetypes and
-rw------- 1 rmorgan wheel 1429857 7 Aug 20:02 Strings.data
-rw------- 1 rmorgan wheel 208669 7 Aug 20:03 bloomfilter sources found in this
-rw------- 1 rmorgan wheel 75 7 Aug 20:03 bucket_info.csv bucket
-rw------- 1 rmorgan wheel 2545204 7 Aug 20:03 merged_lexicon.lex
-rw------- 1 rmorgan wheel 49 7 Aug 20:03 optimize.result
drwx------
-rw-------
5 rmorgan wheel
1 rmorgan wheel
160 7 Aug 20:03 rawdata
97 7 Aug 20:03 splunk-autogen-params.dat
Bloomfilters are
(base) rmorgan-mbp-4cb4b:splunk rmorgan$ ls -al search_demo/db/db_1596632603_1596618900_87/rawdata/ computed when
total 1568
drwx------ 5 rmorgan wheel 160 7 Aug 20:03 . buckets are closed
drwx--x--- 16 rmorgan wheel 512 7 Aug 20:03 ..
-rw------- 1 rmorgan wheel 773899 7 Aug 20:03 journal.zst The journal file that contains the
-rw------- 1 rmorgan wheel 144 7 Aug 20:03 slicemin.dat
-rw------- 1 rmorgan wheel 1200 7 Aug 20:03 slicesv2.dat actual raw data compressed together
© 2020 SPLUNK INC.

Eliminated buckets
Bloomfilters and metadata allows us to eliminate buckets early, avoiding work

considered_buckets vs eliminated_buckets
© 2020 SPLUNK INC.

tstats Processes tsidx Files Only


buckets bucket events The primary reason why tstats is so
events
buckets bucket
highly performant is that it works
exclusively on the TSIDX files.
buckets bucket events

This means that it does no


decompression or parsing, saving a
1. Index and time defines 3. LISPY queries the tsidx to 5. Schema on the fly extracts huge amount of computation.
the considered buckets identify slices to decompress and eliminates events

Unlike _raw search or mstats it


doesn’t support any bucket
buckets
buckets
slices
slices
events
events elimination.
This is likely to feature in future
buckets slices events
releases.

2. Metadata and bloom 4. Extracts and parse 6. Events are processed by


filters eliminate buckets events from slices SPL and returned to the SH
© 2020 SPLUNK INC.

TSIDX reduction is destroyer of performance


Deletes the tsidx files but keeps the bloomfilters, disables almost all work load elimination

😱😱 😱😱
enableTsidxReduction = <boolean>
* Whether or not the tsidx reduction capability is enabled.
* By enabling this setting, you turn on the tsidx reduction capability.
This causes the indexer to reduce the tsidx files of buckets when the
buckets reach the age specified by 'timePeriodInSecBeforeTsidxReduction'.

🤮🤮 🤮🤮
* CAUTION: Do not set this setting to "true" on indexes that have been
configured to use remote storage with the "remotePath" setting.
* Default: false

Just don’t do it!


© 2020 SPLUNK INC.

Frozen Buckets Have No Metadata

The freezing process removes


the metadata from a bucket.
The journal file contains all the
information required to rebuild
the various metadata files.
This is how buckets are
unfrozen.

Don’t tell Elisa

You might also like