Splunk-7 2 3-DistSearch
Splunk-7 2 3-DistSearch
3
Generated: 1/04/2019 4:27 am
i
Table of Contents
Deploy search head clustering
Migrate from a search head pool to a search head cluster.......................78
Migrate settings from a standalone search head to a search head
cluster.......................................................................................................83
Upgrade a search head cluster.................................................................88
Use rolling upgrade...................................................................................91
ii
Table of Contents
Search head pooling........................................................................................179
Overview of search head pooling............................................................179
Create a search head pool......................................................................182
Use a load balancer with the search head pool......................................185
Other pooling operations.........................................................................186
Manage configuration changes...............................................................187
Deployment server and search head pooling..........................................188
Select timing for configuration refresh.....................................................188
Upgrade a search head pool...................................................................189
iii
Overview of distributed search
Use cases
These are some of the key use cases for distributed search:
1
Types of distributed search
There are several basic options for deploying a distributed search environment:
• Use one or more independent search heads to search across the search
peers.
• Deploy multiple search heads in a search head cluster. The search heads
in the cluster share resources, configurations, and jobs. This offers a way
to scale your deployment transparently to your users.
• Deploy search heads as part of an indexer cluster. Among other
advantages, an indexer cluster promotes data availability and data
recovery. The search heads in an indexer cluster can be either
independent search heads or members of a search head cluster.
In each case, the search heads perform only the search management and
presentation functions. They connect to search peers that index data and search
across the indexed data.
A small distributed search deployment has one independent search head; that is,
a search head that is not part of a cluster.
2
Search head clusters
A search head cluster is a group of search heads that work together to provide
scalability and high availability. It serves as a central resource for searching
across a set of search peers.
The search heads in a cluster are, for most purposes, interchangeable. All
search heads have access to the same set of search peers. They can also run or
access the same searches, dashboards, knowledge objects, and so on.
A search head cluster is the recommended topology when you need to run
multiple search heads across the same set of search peers. The cluster
coordinates the activity of the search heads, allocates jobs based on the current
loads, and ensures that all the search heads have access to the same set of
knowledge objects.
Indexer clusters also use search heads to search across the set of indexers, or
peer nodes. The search heads in an indexer cluster can be either independent
search heads or members of a search head cluster.
You deploy and configure search heads very differently when they are part of an
indexer cluster:
• For information on using search head clusters with indexer clusters, read
"Integrate the search head cluster with an indexer cluster".
If you struggle with extremely large high-cardinality searches, you might be able
to apply parallel reduce processing to them to help them complete faster. You
must have a distributed search environment to use parallel reduce search
processing.
High-cardinality searches are searches that must match, filter, and aggregate
fields with extremely large numbers of unique values. During a parallel reduce
3
search process, some or all of a high-cardinality search job is processed in
parallel by indexers that have been configured to behave as intermediate
reducers for the purposes of the search. This parallelization of reduction work
that otherwise would be done entirely by the search head can result in faster
completion times for high-cardinality searches.
If you want to take advantage of parallel reduce search processing, your indexers
should be operating with a light to medium load on average. You can use parallel
reduce search processing whether or not your indexers are clustered.
The search peers use the search head's knowledge bundle to execute queries on
its behalf. When executing a distributed search, the peers are ignorant of any
local knowledge objects. They have access only to the objects in the search
head's knowledge bundle.
Bundles typically contain a subset of files (configuration files and assets) from
$SPLUNK_HOME/etc/system, $SPLUNK_HOME/etc/apps and
$SPLUNK_HOME/etc/users.
4
Location of the knowledge bundle
After you add search peers to the search head, as described in "Add search
peers to the search head," you can view the replication status of the knowledge
bundle:
1. On the search head, click Settings at the top of the Splunk Web page.
There is a row for each search peer. The column Replication status indicates
whether the search head is successfully replicating the knowledge bundle to the
search peer.
Note: In the case of a search head cluster, you must view replication status
from the search head cluster captain. This is because only the captain replicates
the knowledge bundle to the cluster's search peers. The other cluster members
do not participate in bundle replication. If you view the search peers' status from
a non-captain member, the Replication status column might read "Initial"
instead of "Successful."
User authorization
All authorization for a distributed search originates from the search head. At the
time it sends the search request to its search peers, the search head also
distributes the authorization information. It tells the search peers the name of the
user running the search, the user's role, and the location of the distributed
authorize.conf file containing the authorization information.
5
Deploy distributed search
The basic configuration to enable distributed search is simple. You designate one
Splunk Enterprise instance as the search head and establish connections from
the search head to one or more search peers, or indexers.
If you need to deploy more than a single search head, the best practice is to
deploy the search heads in a search head cluster.
6
The search head interfaces with the user and manages searches across the set
of indexers. The indexers index incoming data and search the data, as directed
by the search head.
Choose an existing instance that is not indexing external data or install a new
instance. For installation information, see the topic in the Installation Manual
specific to your operating system.
3. Establish connections from the search head to all the search peers that you
want it to search across. This is the key step in the procedure. See Add search
peers to the search head.
4. Add data inputs to the search peers. You add inputs in the same way as for
any indexer, either directly on the search peer or through forwarders connecting
to the search peer. See the Getting Data In manual for information on data
inputs.
5. Forward the search head's internal data to the search peers. See Best
practice: Forward search head data to the indexer layer.
6. Log in to the search head and perform a search that runs across all the search
peers, such as a search for *. Examine the splunk_server field in the results.
Verify that all the search peers are listed in that field.
7
Deploy multiple search heads
To deploy multiple search heads, the best practice is to deploy the search heads
in a search head cluster. This provides numerous advantages, including
simplified scaling and management. See the chapter Deploy search head
clustering.
Splunk indexer clusters use search heads to search across their set of
indexers, or peer nodes. You deploy search heads very differently when they
are part of an indexer cluster. To learn about deploying search heads in indexer
clusters, read Enable the search head in the Managing Indexers and Clusters of
Indexers manual.
For information on the hardware requirements for search heads and search
peers (indexers), see Reference hardware in the Capacity Planning Manual.
For search head cluster and indexer cluster deployments, each cluster node
must be running on the same operating system version. For more information on
indexer cluster requirements, see System requirements and other deployment
considerations for indexer clusters in Managing indexers and clusters of
8
indexers.
Upgrade search heads and search peers at the same time to take full advantage
of the latest search capabilities. If you cannot do so, follow these version
compatibility guidelines.
The following rules define compatibility requirements between search heads and
search peers:
• 7.x search heads are compatible with 7.x and 6.x search peers.
• The search head must be at the same or a higher level than the search
peers. See the note later in this section for a precise definition of "level" in
this context.
In contrast, here are examples of some combinations that are not compatible:
• These guidelines are valid for standalone search heads and for search
heads that are participating in a search head cluster.
• Search heads participating in indexer clusters have different compatibility
restrictions. See Splunk Enterprise version compatibility in Managing
Indexers and Clusters of Indexers.
• Compatibility is significant at the major/minor release level, but not at the
maintenance level. For example, a 6.3 search head is not compatible with
a 6.4 search peer, because the 6.3 search head is at a lower minor
release level than the 6.4 search peer. However, a 6.3.1 search head is
compatible with a 6.3.3 search peer, despite the lower maintenance
release level of the search head.
9
Mixed-version distributed search compatibility
You can run a 6.x search head against 5.x search peers, but there are a few
compatibility issues to be aware of. To take full advantage of the 6.x feature set,
upgrade search heads and search peers at the same time.
When running a 6.x search head against 5.x search peers, note the following:
• You can use data models on the search head, but only without report
acceleration.
• You can use Pivot on the search head.
• You can run predictive analytics (the predict command) on the search
head.
Synchronize the system clocks on all machines, virtual or physical, that are
running Splunk Enterprise distributed search instances. Specifically, this means
your search heads and search peers. In the case of search head pooling or
mounted bundles, this also includes the shared storage hardware. Otherwise,
various issues can arise, such as bundle replication failures, search failures, or
premature expiration of search artifacts.
The synchronization method that you use depends on your specific set of
machines. Consult the system documentation for the particular machines and
operating systems on which you are running Splunk Enterprise. For most
environments, Network Time Protocol (NTP) is the best approach.
10
Add search peers to the search head
To activate distributed search, you add search peers, or indexers, to a Splunk
Enterprise instance that you designate as a search head. You do this by
specifying each search peer manually.
Important: A search head cannot perform a dual function as a search peer. The
only exception to this rule is for the monitoring console, which functions as a
"search head of search heads."
This topic describes how to connect a search head to a set of search peers.
If you need to connect multiple search heads to a set of search peers, you can
repeat the process for each search head individually. However, if you require
multiple search heads, the best practice is to deploy them in a search head
cluster. A search head cluster can also replicate all search peers from one
search head to all the other search heads in the cluster, so that you do not have
to add the peers to each search head separately.
Configuration overview
To set up the connection between a search head and its search peers, configure
the search head through one of these methods:
• Splunk Web
• Splunk CLI
• The distsearch.conf configuration file
11
The configuration occurs on the search head. For most deployments, no
configuration is necessary on the search peers. Access to the peers is controlled
through public key authentication.
Prerequisites
Before an indexer can function as a search peer, you must change its password
from the default value. Otherwise, the search head will not be able to
authenticate against it.
1. Log into Splunk Web on the search head and click Settings at the top of the
page.
Note: You must precede the search peer's host name or IP address with the URI
scheme, either "http" or "https".
6. Click Save.
1. Log into Splunk Web on the search head and click Settings at the top of the
page.
12
3. Click Distributed search setup.
6. Click Save.
To add a search peer, run this command from the search head:
For example:
Edit distsearch.conf
The settings available through Splunk Web provide sufficient options for most
configurations. Some advanced configuration settings, however, are only
available by directly editing distsearch.conf. This section discusses only the
configuration settings necessary for connecting search heads to search peers.
For information on the advanced configuration options, see the distsearch.conf
spec file.
13
1. On the search head, create or edit a distsearch.conf file in
$SPLUNK_HOME/etc/system/local.
2. Add the search peers to the servers setting under the [distributedSearch]
stanza. Specify the peers as a set of comma-separated values (host names or IP
addresses with management ports). For example:
[distributedSearch]
servers = https://192.168.1.1:8089,https://192.168.1.2:8089
Note: You must precede the host name or IP address with the URI scheme,
either "http" or "https".
If you add search peers via Splunk Web or the CLI, Splunk Enterprise
automatically configures authentication. However, if you add peers by editing
distsearch.conf, you must distribute the key files manually. After adding the
search peers and restarting the search head, as described above:
Multiple search heads can search across a single peer. The peer must store a
copy of each search head's certificate.
The search peer stores the search head keys in directories with the specification
$SPLUNK_HOME/etc/auth/distServerKeys/<searchhead_name>.
For example, if you have two search heads, named A and B, and they both need
to search one particular search peer, do the following:
14
1. On the search peer, create the directories
$SPLUNK_HOME/etc/auth/distServerKeys/A/ and
$SPLUNK_HOME/etc/auth/distServerKeys/B/.
You can group search peers into distributed search groups. This allows you to
target searches to subsets of search peers. See Create distributed search
groups.
The preferred approach is to forward the data directly to the indexers, without
indexing separately on the search head. You do this by configuring the search
head as a forwarder. These are the main steps:
15
1. Make sure that all necessary indexes exist on the indexers. For example,
the S.o.S app uses a scripted input that puts data into a custom index. If you
install S.o.S on the search head, you need to also install the S.o.S Add-on on the
indexers, to provide the indexers with the necessary index settings for the data
the app generates. On the other hand, since _audit and _internal exist on
indexers as well as search heads, you do not need to create separate versions of
those indexes to hold the corresponding search head data.
[tcpout]
defaultGroup = my_search_peers
forwardedindex.filter.disable = true
indexAndForward = false
[tcpout:my_search_peers]
server=10.10.10.1:9997,10.10.10.2:9997,10.10.10.3:9997
This example assumes that each indexer's receiving port is set to 9997.
You perform the same configuration steps to forward data from search head
cluster members to their set of search peers. However, you must ensure that all
members use the same outputs.conf file. To do so, do not edit the file on the
individual search heads. Instead, use the deployer to propagate the file across
the cluster. See "Use the deployer to distribute apps and configuration updates."
16
Manage distributed search
The knowledge bundle consists of a set of files that the search peers ordinarily
need in order to perform their searches. You can, if necessary, modify this set of
files. The main reasons for modifying the set of files are if:
• As an app developer, you want to customize the files for the needs of
your app. This case usually involves manipulating the replication whitelist.
You can also use a replication blacklist for this purpose.
See distsearch.conf in the Admin Manual for details on the settings discussed in
this topic.
1. [replicationWhitelist]
2. [replicationSettings:refineConf]
17
Since the system starts by examining the [replicationWhitelist] stanza, this
discussion does too.
If you do need to alter the whitelist, you can override the system default whitelist
by creating a version of the [replicationWhitelist] stanza in
$SPLUNK_HOME/etc/apps/<appname>/default/distsearch.conf:
[replicationWhitelist]
<name> = <whitelist_regex>
...
The knowledge bundle will include all files that both satisfy the whitelist regex
and are specified in [replicationSettings:refineConf]. If multiple regex's are
specified, the bundle will include the union of those files.
In this example, the knowledge bundle will include all files with extensions of
either ".conf" or ".spec":
[replicationWhitelist]
allConf = *.conf
allSpec = *.spec
The names, such as allConf and allSpec, are used only for layering. That is, if
you have both a global and a local copy of distsearch.conf, the local copy can
be configured so that it overrides only one of the regex's. For instance, assume
that the example shown above is the global copy and that you then specify a
whitelist in your local copy like this:
[replicationWhitelist]
18
allConf = *.foo.conf
The two conf files will be layered, with the local copy taking precedence. Thus,
the search head will distribute only files that satisfy these two regex's:
allConf = *.foo.conf
allSpec = *.spec
Caution: Replication whitelists are applied globally across all conf data, and are
not limited to any particular app, regardless of where they are defined. Be careful
to pull in only your intended files.
The system default distsearch.conf file includes a version of this stanza that
specifies the *.conf files that are normally included in the knowledge bundle:
[replicationSettings:refineConf]
# Replicate these specific *.conf files and their associated *.meta
stanzas.
replicate.app = true
replicate.authorize = true
replicate.collections = true
replicate.commands = true
replicate.eventtypes = true
replicate.fields = true
replicate.segmenters = true
replicate.literals = true
replicate.lookups = true
replicate.multikv = true
replicate.props = true
replicate.tags = true
replicate.transforms = true
replicate.transactiontypes = true
If you want to replicate a .conf file that is not in the system default version of the
[replicationSettings:refineConf] stanza, create a version of the stanza in
$SPLUNK_HOME/etc/apps/<appname>/default/distsearch.conf and specify the
19
*.conf file there. Similarly, you can remove files from the bundle by setting them
to "false" in this stanza.
Caution: Replication blacklists are applied globally across all conf data, and are
not limited to any particular app, regardless of where they are defined. If you are
defining an app-specific blacklist, be careful to constrain it to match only files that
your application will not need.
In distributed search, all search heads and search peers in the group must have
unique names. The serverName has three specific uses in distributed search:
Note: serverName is not used when adding search peers to a search head. In
that case, you identify the search peers through their domain names or IP
addresses.
The only reason to change serverName is if you have multiple instances of Splunk
Enterprise residing on a single machine, and they're participating in the same
distributed search group. In that case, you'll need to change serverName to
20
distinguish them.
For example, say you have a set of search peers in New York and another set in
San Francisco, and you want to perform searches across peers in just a single
location. You can do this by creating two search groups, NYC and SF. You can
then specify the search groups in searches.
Distributed search groups are particularly useful when configuring the monitoring
console. See Monitoring Splunk Enterprise.
For example, to create the two search groups NYC and SF, create stanzas like
these:
[distributedSearch]
# This stanza lists the full set of search peers.
servers = 192.168.1.1:8089, 192.168.1.2:8089, 175.143.1.1:8089,
175.143.1.2:8089, 175.143.1.3:8089
[distributedSearch:NYC]
# This stanza lists the set of search peers in New York.
default = false
servers = 192.168.1.1:8089, 192.168.1.2:8089
[distributedSearch:SF]
# This stanza lists the set of search peers in San Francisco.
default = false
servers = 175.143.1.1:8089, 175.143.1.2:8089, 175.143.1.3:8089
Note the following:
21
• The group lists can overlap. For example, you can add a third group
named "Primary_Indexers" that contains some peers from each location.
• If you set a group's default attribute to "true," the peers in that group will
be the ones queried when the search does not specify a search group.
Otherwise, if you set all groups to "false," the full set of search peers in the
[distributedSearch] stanza will be queried when the search does not
specify a search group.
To use a search group in a search, specify the search group like this:
This feature is not valid for indexer clustering, except for limited use cases in
certain complex topologies.
In indexer clustering, the cluster replicates the data buckets arbitrarily across the
set of search peers, or "cluster peer nodes". It then assigns one copy of each
bucket to be the primary copy, which participates in searches. There is no
guarantee that a specific peer or subset of peers will contain the primary bucket
copies for a particular search. Therefore, if you put peers into distributed search
groups and then run searches based on those groups, the searches might
contain incomplete results.
For details of bucket replication in indexer clusters, see Buckets and indexer
clusters in Managing Indexers and Clusters of Indexers.
• Multiple indexer clusters, where you need to identify the peer nodes for a
specific cluster.
• Search heads that run searches across both an indexer cluster and
standalone indexers. You might want to put the standalone indexers into
their own group.
22
Remove a search peer
You can remove a search peer from a search head through Splunk Web or the
CLI. As you might expect, doing so merely removes the search head's
knowledge of that search peer; it does not affect the peer itself.
You can remove a search peer from a search head through the Search peers
page on the search head's Splunk Web. See View search peer status in Settings.
Note: This only removes the search peer entry from the search head; it does not
remove the search head key from the search peer. In most cases, this is not a
problem and no further action is needed.
On the search head, run the splunk remove search-server command to remove
a search peer from the search head:
• Use the -auth flag to provide credentials for the search head only.
• <host> is the host name or IP address of the search peer's host machine.
• <port> is the management port of the search peer.
For example:
In the case of a search head cluster, the peer removal action replicates to all
other cluster members only if you have enabled search peer replication.
Otherwise, you must remove the search peers from each member individually.
For information on enabling search peer replication, see Replicate the search
peers across the cluster.
23
Disable the trust relationship
As an additional step, you can disable the trust relationship between the search
peer and the search head. To do this, delete the trusted.pem file from
$SPLUNK_HOME/etc/auth/distServerKeys/<searchhead_name> on the search peer.
24
View distributed search status
1. On the search head, click Settings at the top of the Splunk Web page.
There is a row for each search peer, with the following columns:
• Peer URI
• Splunk instance name
• State. Specifies whether the peer is up or down.
• Replication status. Indicates the status of knowledge bundle replication
between the search head and the search peer:
♦ Initial. Default state of the peer, before the peer has received its
first knowledge bundle from this search head. The peer remains in
this state for approximately replication_period_sec in
limits.conf, which is 60 seconds by default.
♦ In Progress. A bundle replication is in progress.
♦ Successful. The peer has received a bundle from this search
head. The peer is ready to participate in distributed searches.
♦ Failed. Something went wrong with bundle replication.
• Cluster label. This field contains a value if this peer is part of an indexer
cluster and the indexer cluster has a label.. See Set cluster labels in
Monitoring Splunk Enterprise.
• Health status. When the search head sends a heartbeat to a peer (by
default, every 60 seconds), it performs a series of health checks on that
peer. The results determine the health status of the peer:
♦ Healthy. The peer passes all health checks during 50% or more of
the heartbeats over the past 10 minutes.
♦ Sick. The peer fails a health check during more than 50% of the
heartbeats over the past 10 minutes. See the Health check
failures column for details.
♦ Quarantined. A peer that does not currently participate in
distributed searches. See Quarantine a search peer.
25
• Health check failures. This column provides details of any health check
failures. It lists all failures over the last 10 minutes. Each heartbeat-timed
set of health checks stops at the first heath check failure, so the list
includes only the first failure, if any, for each heartbeat.
• Status. Enabled or disabled.
• Actions. You can quarantine this peer or delete it from the search head.
See Quarantine a search peer and Remove a search peer.
You can also use the monitoring console to get information about the search
peers. See Use the monitoring console to view distributed search status.
There are two distributed search dashboards under the Search menu:
You can also use Settings to get information about the search peers. See View
search peers in Settings.
26
Manage parallel reduce search processing
In a typical distributed search process, there are two broad search processing
phases: a map phase and a reduce phase. The map phase takes place across
the indexers in your deployment. In the map phase, the indexers locate event
data that matches the search query and sort it into field-value pairs. When the
map phase is complete, indexers send the results to the search head for the
reduce phase. During the reduce phase, the search heads process the results
through the commands in your search and aggregate them to produce a final
result set.
The parallel reduce process inserts an intermediate reduce phase into the
map-reduce paradigm, making it a three-phase map-reduce-reduce operation. In
this intermediate reduce phase, a subset of your indexers serve as intermediate
reducers. The intermediate reducers divide up the mapped results and perform
reduce operations on those results for certain supported search commands.
When the intermediate reducers complete their work, they send the results to the
search head, where the final result reduction and aggregation operations take
place. The parallel processing of reduction work that otherwise would be done
27
entirely by the search head can result in faster completion times for
high-cardinality searches that aggregate large numbers of search results.
The following diagram illustrates the three-phase parallel reduce search process.
For more
Prerequisite Details information
see
Parallel reduce search About
A distributed search
processing requires a distributed distributed
environment.
search deployment architecture. search
Parallel reduce search
processing is not site-aware. Do
An environment where the not use it if your indexers are in
indexers are at a single site. a multisite indexer cluster, or if
you have non-clustered indexers
spread across several sites.
Upgrade all Splunk instances
How to
that participate in the parallel
Splunk platform version upgrade Splunk
reduce process to version 7.1.0
7.1.0 or later for all Enterprise in
or later. Participating instances
participating machines. the Installation
include all indexers and search
Manual
heads.
Internal search head data The parallel reduce search Best Practice:
forwarded to the indexer process ignores all data on the Forward search
layer. search head. If you plan to run head data to
28
For more
Prerequisite Details information
see
parallel reduce searches, the the indexer
best practice is to forward all layer
search head data to the indexer
layer.
Parallel reduce search
processes add a significant
See Use the
amount of indexer load. If you
monitoring
attempt to run parallel reduce
console to view
searches in an already
index and
A low to medium average overloaded indexer system, you
volume status,
indexer load. might encounter slow
in Managing
performance. If you run an
Indexers and
indexer cluster, you might see
Clusters of
skipped heartbeats between
Indexers
peer nodes and the cluster
master.
Admins must set an identical
pass4SymmKey security key in the Configure your
[parallelreduce] stanza of indexers to
All indexers configured to
server.conf for all indexers. communicate
allow secure communication
This security key enables with
with intermediate reducers.
communication between intermediate
indexers and intermediate reducers
reducers.
Users must have the
run_multi_phased_searches
Users with roles that include Apply parallel
capability to use the
the reduce
redistribute command. The
run_multi_phased_searches processing to
redistribute command applies
capability. searches
parallel reduce search
processing to a search.
Next steps
Learn how to configure your deployment for parallel reduce search processing.
See Configure parallel reduce search processing.
29
Configure parallel reduce search processing
To enable parallel reduce search processing for your deployment, you need to
configure your indexers to work as intermediate reducers and determine how
your deployment should distribute the parallel reduction workload across your
indexers.
If this is your first time reading about this feature, see Overview of parallel reduce
search processing for an overview of parallel reduce search processing and a list
of prerequisites.
To gain the benefits of parallel reduce search processing, you must configure all
of your indexers so that they have the potential to work as intermediate reducers.
You accomplish this configuration by giving each of your indexers an identical
pass4SymmKey security key. This security key enables secure communication
between indexers and intermediate reducers.
To update your indexer configurations, you must have access to the server.conf
file for your Splunk deployment, located in $SPLUNK_HOME/etc/system/local/.
See About configuration files and the topics that follow it in the Admin Manual for
more information about making configuration file updates.
Parallel reduce search processing is not site-aware. Do not add this configuration
to your indexers if they are in a multisite indexer cluster or if they are
non-clustered and spread across several sites.
Your indexer configurations might already have pass4SymmKey values under their
[general] and [clustering] stanzas. Do not change those pass4SymmKey
settings. Do not use the same security key values as those pass4SymmKey
settings.
Save a copy of the key. After you set the key for an indexer and reboot the
indexer, the security key changes from clear text to encrypted form, and it is no
30
longer recoverable from server.conf. If you add a new intermediate reducer
later, you must use the clear text version of the key to set it.
Prerequisites
The following prerequisite topics are useful if you run an indexer cluster.
Steps
1. Open server.conf and locate the settings for an indexer. Indexers are
identified with a [<hostname>:<port>] stanza.
2. Add the following stanza and security configuration to the settings for the
indexer:
[parallelreduce]
pass4SymmKey=<password>
3. Save your server.conf changes.
4. Restart the indexer with the CLI restart command:
$SPLUNK_HOME/bin/splunk restart
Repeat these steps for each indexer in your deployment. Use the same
<password> for each indexer in your deployment.
For example, if you keep the default parallel reduce settings in limits.conf, the
Splunk platform randomly selects a certain number of intermediate reducers
each time you run a parallel reduce search. If all of your indexers are in a
single-site indexer cluster, the random selection aids in distributing the parallel
reduction workload across the cluster.
31
However, if your indexers are not clustered, and some of your indexers have
large indexing loads on average while others do not, you can use the reducers
setting to configure the low-load indexers to be dedicated intermediate reducers.
Dedicated intermediate reducers are always used when you run a parallel reduce
search process.
These two methods are mutually exclusive. When you set up dedicated
intermediate reducers, the Splunk platform cannot randomly select intermediate
reducers.
To configure parallel reduce search processing, you must have access to the
limits.conf file for your Splunk deployment, located in
$SPLUNK_HOME/etc/system/local/. See About configuration files and the topics
that follow it in the Admin Manual for more information about making
configuration file updates.
The default parallel reduce search processing settings enable the Splunk
platform to randomly select intermediate reducers from the larger set of indexers
when you run parallel reduce searches. The default number of indexers that the
Splunk platform repurposes as intermediate reducers during the intermediate
reduce phase of the parallel reduce search process is 50% of the total number of
indexers in your indexer pool, up to a maximum of 4 indexers.
Default
Setting name Definition
value
The maximum number of indexers that can be
used as intermediate reducers in the
maxReducersPerPhase 4
intermediate reduce phase of a parallel
reduce search.
winningRate The percentage of indexers that can be 50
selected from the total pool of indexers and
32
Default
Setting name Definition
value
used as intermediate reducers in a parallel
reduce search process. This setting applies
only when the reducers setting is not
configured in limits.conf. See Enable
dedicated intermediate reducers.
Enable dedicated intermediate reducers
All indexers in the reducers list are used as intermediate reducers when you run
a parallel reduce search. If the number of indexers in the reducers list exceeds
the value of the maxReducersPerPhase setting, the Splunk platform randomly
selects the intermediate reducers from the reducers list. For example, if the
reducers setting lists five reducers and maxReducersPerPhase=4, the Splunk
platform randomly selects four intermediate reducers from the list.
If all of the indexers in the reducers list are down or are otherwise invalid,
searches with the redistribute command run without parallel reduction. All
reduce operations are processed on the search head.
When you configure the reducers setting for your deployment, the Splunk
platform ceases to apply the winningRate setting.
When you run a parallel reduce search with the redistribute command, you can
use the num_of_reducers argument to override the number of reducers
determined by the parallel reduce search settings in the limits.conf file.
For example, say your limits.conf settings determine that seven intermediate
reducers are used by default in all parallel reduce searches. You can design a
33
parallel reduce search where num_of_reducers = 5. Every time that search runs,
only five intermediate reducers are used in its intermediate reduce phase.
If you provide a value for the num_of_reducers setting that exceeds the limit set
by the maxReducersPerPhase setting in the limits.conf file, the Splunk platform
sets the number of reducers to the maxReducersPerPhase value.
Next steps
If this is your first time reading about this feature, see Overview of parallel reduce
search processing for an overview of parallel reduce search processing and a list
of prerequisites.
34
About the run_multi_phased_searches capability
By default, the number of concurrent parallel reduce searches that can run on an
intermediate reducer is limited to the number of CPU cores in the reducer. This
default is controlled by the maxPrdSearchesPerCpu setting in limits.conf.
35
Overview of search head clustering
To achieve this interchangeability, the search heads in the cluster must share
configurations and apps, search artifacts, and job scheduling. Search head
clusters automatically propagate most of these shared resources among the
members.
Cluster architecture
36
each scheduled search to the optimal member, usually the member with
the least load.
• Search artifacts. The cluster replicates search artifacts and makes them
available to all members.
• Configurations. The cluster requires that all members share the same set
of configurations. For runtime updates to knowledge objects, such as
updates to dashboards or reports, the cluster replicates configurations
automatically to all members. For apps and some other configurations, the
user must push configurations to the cluster members by means of the
deployer, a Splunk Enterprise instance that resides outside the cluster.
You set up a cluster by configuring and deploying the cluster's search heads. The
process is similar to how you set up search heads in any distributed search
environment. The main difference is that you also need to configure the search
heads as cluster members.
Users access the cluster the same way that they access any search head. They
point their browser at any search head that is a member of the cluster. Because
cluster members share jobs, search artifacts, and configurations, it does not
matter which search head a user accesses. The user has access to the same set
of dashboards, searches, and so on.
To achieve the goals of high availability and load balancing, Splunk recommends
that you put a load balancer in front of the cluster. That way, the load balancer
can assign the user to any search head in the cluster and balance the user load
across the cluster members. If one search head goes down, the load balancer
can reassign the user to any remaining search head.
Search head clusters are different from indexer clusters. The primary purpose
of indexer clusters is to provide highly available data through coordinated groups
of indexers. Indexer clusters always include one or more associated search
heads to access the data on the indexers. These search heads might be, but are
37
not necessarily, members of a search head cluster.
For information on search heads in indexer clusters, see the chapter "Configure
the search head" in the Managing Indexers and Clusters of Indexers manual.
For information on adding a search head cluster to an indexer cluster, see the
topic "Integrate the search head cluster with an indexer cluster" in this manual.
One cluster member has the role of captain, which means that it coordinates job
scheduling and replication activities among all the members. It also serves as a
search head like any other member, running search jobs, serving results, and so
on. Over time, the role of captain can shift among the cluster members.
In addition to the set of search head members that constitute the actual cluster, a
functioning cluster requires several other components:
38
interface, without needing to specify a particular search head. See Use a
load balancer with search head clustering.
• One member serves as the captain, directing various activities within the
cluster.
• The members communicate among themselves to schedule jobs, replicate
artifacts, update configurations, and coordinate other activities within the
cluster.
• The members communicate with search peers to fulfill search requests.
• Users can optionally access the search heads through a third-party load
balancer.
• A deployer sits outside the cluster and distributes updates to the cluster
members.
39
activities of the cluster. Any member can perform the role of captain, but the
cluster has just one captain at any time. Over time, if failures occur, the captain
changes and a new member gets elected to the role.
The elected captain is known as a dynamic captain, because it can change over
time. A cluster that is functioning normally uses a dynamic captain. You can
deploy a static captain as a temporary workaround during disaster recovery, if
the cluster is not able to elect a dynamic captain.
The captain is a cluster member and in that capacity it performs the search
activities typical of any cluster member, servicing both ad hoc and scheduled
searches. If necessary, you can limit the captain's search activities so that it
performs only ad hoc searches and not scheduled searches. See Configure the
captain to run ad hoc searches only.
The captain also coordinates activities among all cluster members. Its
responsibilities include:
Captain election
A search head cluster normally uses a dynamic captain. This means that the
member serving as captain can change over the life of the cluster. Any member
has the ability to function as captain. When necessary, the cluster holds an
election, which can result in a new member taking over the role of captain.
40
• The current captain fails or restarts.
• A network partition occurs, causing one or more members to get cut from
the rest of the search head cluster. Subsequent healing of the network
partition triggers another, separate captain election.
• The current captain steps down, because it does not detect that a majority
of members are participating in the cluster.
To become captain, a member needs to win a majority vote of all members. For
example, in a seven-member cluster, election requires four votes. Similarly, a
six-member cluster also requires four votes.
The majority must be a majority of all members, not just of the members currently
running. So, if four members of a seven-member cluster fail, the cluster cannot
elect a new captain, because the remaining three members are fewer than the
required majority of four.
The election process involves timers set randomly on all the members. The
member whose timer runs out first stands for election and asks the other
members to vote for it. Usually, the other members comply and that member
becomes the new captain.
It typically takes one to two minutes after a triggering event occurs to elect a new
captain. During that time, there is no functioning captain, and the search heads
are aware only of their local environment. The election takes this amount of time
because each member waits for a minimum timeout period before trying to
become captain. These timeouts are configurable.
The cluster might re-elect the member that was the previous captain, if that
member is still running. There is no bias either for or against this occurring.
41
For details of your cluster's captain election process, view the Search Head
Clustering: Status and Configuration dashboard in the monitoring console. See
Use the monitoring console to view search head cluster status.
Control of captaincy
You have some control over which members become captain. In particular, you
can:
If the cluster lacks a majority of members and therefore cannot elect a captain,
the members will continue to function as independent search heads. However,
they will only be able to service ad hoc searches. Scheduled reports and alerts
will not run, because, in a cluster, the scheduling function is relegated to the
captain. In addition, configurations and search artifacts will not be replicated
during this time.
To remedy this situation, you can temporarily deploy a static captain. See Use
static captain to recover from loss of majority.
If you do not deploy a static captain during the time that the cluster lacks a
majority, the cluster will not function again until a majority of members rejoin the
cluster. When a majority is attained, the members elect a captain, and the cluster
starts to function.
• Runtime configurations
• Scheduled reports
42
Once the cluster starts functioning, it attempts to sync the runtime configurations
of the members. Since the members were able to operate independently during
the time that their cluster was not functioning, it is likely that each member
developed its own unique set of configuration changes during that time. For
example, a user might have created a new saved search or added a new panel
to a dashboard. These changes must now be reconciled and replicated across
the cluster. To accomplish this, each member reports its set of changes to the
captain, which then coordinates the replication of all changes, including its own,
to all members. At the end of this process, all members should have the same
set of configurations.
Caution: This process can only proceed automatically if the captain and each
member still share a common commit in their change history. Otherwise, it will be
necessary to manually resync the non-captain member against the captain's
current set of configurations, causing that member to lose all of its intervening
changes. Configurable purge limits control the change history. For details of
purge limits and the resync process, see Replication synchronization issues.
The recovered cluster also begins handling scheduled reports again. As for
whether it attempts to run reports that were skipped while the cluster was down,
that depends on the type of scheduled report. For the most part, it will just pick
up the reports at their next scheduled run time. However, the scheduler will run
reports employed by report acceleration and data model acceleration from the
point when they were last run before the cluster stopped functioning. For detailed
information on how the scheduler handles various types of reports, see Configure
the priority of scheduled reports in the Reporting Manual.
The need of a majority vote for a successful election has these deployment
implications:
• If you are deploying the cluster across two sites, your primary site must
contain a majority of the nodes. If there is a network disruption between
the sites, only the site with a majority can elect a new captain. See
43
Important considerations when deploying a search head cluster across
multiple sites.
The cluster replicates most search artifacts, also known as search results, to
multiple cluster members. If a member needs to access an artifact, it accesses a
local copy, if possible. Otherwise, it uses proxying to access the artifact.
Artifact replication
The set of members receiving copies can change from artifact to artifact. That is,
two artifacts from the same originating member might have their replicated
copies on different members.
The captain maintains the artifact registry, with information on the locations of
copies of each artifact. When the registry changes, the captain sends the delta to
each member.
If a member goes down, thus causing the cluster to lose some artifact copies, the
captain coordinates fix-up activities, with the goal of returning the cluster to a
state where each artifact has the replication factor number of copies.
Replicated search artifacts can be identified by the prefix rsa_. The original
artifacts do not have this prefix.
44
For details of your cluster's artifact replication process, view the Search Head
Clustering: Artifact Replication dashboard in the monitoring console. See Use the
monitoring console to view search head cluster status.
Artifact proxying
The cluster only replicates search artifacts resulting from scheduled saved
searches. It does not replicate results from these other search types:
With a few exceptions, all cluster members must use the same set of
configurations. For example, if a user edits a dashboard on one member, the
updates must somehow propagate to all the other members. Similarly, if you
distribute an app, you must distribute it to all members. Search head clustering
has methods to ensure that configurations stay in sync across the cluster.
There are two types of configuration changes, based on how they are distributed
to cluster members:
See How configuration changes propagate across the search head cluster.
45
Job scheduling
The captain schedules saved search jobs, allocating them to the various cluster
members according to load-based heuristics. Essentially, it attempts to assign
each job to the member currently with the least search load.
The captain can allocate saved search jobs to itself. It does not, however,
allocate scheduled real time searches to itself.
If a job fails on one member, the captain reassigns it to a different member. The
captain reassigns the job only once, as multiple failures are unlikely to be
resolvable without intervention on the part of the user. For example, a job with a
bad search string will fail no matter how many times the cluster attempts to run it.
You can designate a member as "ad hoc only." In that case, the captain will not
schedule jobs on it. You can also designate the captain functionality as "ad hoc
only." The current captain then will never schedule jobs on itself. Since the role of
captain can move among members, this setting ensures that captain functionality
does not compete with scheduled searches. See Configure a cluster member to
run ad hoc searches only.
Note: The captain does not have insight into the actual CPU load on each
member's machine. It assumes that all machines in the cluster are provisioned
homogeneously, with the same number and type of cores, and so forth.
For details of your cluster's scheduler delegation process, view the Search Head
Clustering: Scheduler Delegation dashboard in the monitoring console. See Use
the monitoring console to view search head cluster status.
The search head cluster, like non-clustered search heads, enforces several types
of concurrent search limits:
46
authorize.conf spec file for details on all the settings that control these
quotas.
• Overall search quota. This quota determines the maximum number of
historical searches (combined scheduled and ad hoc) that the cluster can
run concurrently. This quota is configured with max_searches_per_cpu and
related settings in limits.conf. See the limits.conf spec file for details on
all the settings that control these quotas.
For information on determining the state of a member, see Show cluster status.
For information on "ad hoc only" members, see Configure a cluster member to
run ad hoc searches only.
For details on how base scheduler concurrency limits are determined, see the
limits.conf spec file.
Although each quota type (user/role or overall) has its own attribute for setting its
enforcement behavior, the behavior itself works the same for each quota type.
47
configured as "ad hoc only."
The captain uses the computed cluster-wide quota to determine whether to allow
a scheduled search to run. No member-specific enforcement of searches occurs,
except in the case of ad hoc searches, as described in Search quotas and ad
hoc searches.
In the case of user/role quotas, the captain multiplies the base concurrent search
quota allocated to a user/role by the number of "Up" cluster members to
determine the cluster-wide quota for that user/role. For example, in a
seven-member cluster, it multiplies the value of srchJobsQuota by 7 to determine
the number of concurrent historical searches for the user/role.
Similarly, in the case of overall search quotas, the captain multiples the base
overall search quota by the number of "Up" members to determine the
cluster-wide quota for all searches.
For details of your cluster's search concurrency status, view the Search Head
Clustering: Status and Configuration dashboard in the monitoring console. See
Use the monitoring console to view search head cluster status.
48
Note: The captain only controls the running of scheduled searches. It has no
control over whether ad hoc searches run. Instead, each individual member
decides for its own ad hoc searches, based on the individual member search
limits. However, the members feed information on their ad hoc searches to the
captain, which includes those searches when comparing concurrent searches
against the quotas. see Search quotas and ad hoc searches.
Each search quota spans both scheduled searches and ad hoc searches.
Because of the way that the captain learns about ad hoc searches, the number of
cluster-wide concurrent searches can sometimes exceed the search quota. This
is true for both types of search quotas, user/role quotas and overall quotas.
If, for example, you configure the cluster to enforce the overall search quota on a
cluster-wide basis, the captain handles quota enforcement by comparing the total
number of searches running across all members to the search quota.
The captain calculates the overall search quota by multiplying the base
concurrent search quota by the number of "Up" cluster members, as described in
How the cluster enforces quotas.
The captain calculates the number of concurrent searches running across all
members by adding together the total number of scheduled and ad hoc searches
in progress:
When the number of all searches, both scheduled and ad hoc, reaches the
quota, the captain ceases initiating new scheduled searches until the number of
searches falls below the quota.
49
A user always initiates an ad hoc search directly on a member. The member
uses its own set of search quotas, without consideration or knowledge of the
cluster-wide search quota, to decide whether to allow the search. The member
then reports the new ad hoc search to the captain. If the captain has already
reached the cluster-wide quota, then a new ad hoc search causes the cluster to
temporarily exceed the quota. This results in the captain reporting more searches
than the number allowable by the search quota.
To enforce these quotas on a cluster-wide basis instead, set the attribute to true:
shc_role_quota_enforcement=true
For details of this setting, see limits.conf.
To enforce this quota on a cluster-wide basis instead, set the attribute to true:
shc_syswide_quota_enforcement=true
For details of this setting, see limits.conf.
50
Change to the default behavior With 6.5, there was a change in the default
behavior for enforcing user/role-based concurrent search quotas.
The captain does not take into account the search user when it assigns a search
to a member. Combined with member-enforced quotas, this could result in
unwanted and unexpected behavior.
For example, say you have a three-member cluster, and the search concurrency
quota for role X is set to 4. At some point, two members are running four
searches for X and one is running only two. The scheduler then dispatches a
new search for X that lands on a member that is already running four searches.
What happens next depends on whether the cluster is enforcing quotas on a
member-by-member or cluster-wide basis:
51
While cluster-wide enforcement has the advantage of allowing full utilization of
the search concurrency quotas across the set of cluster members, it has the
potential to cause miscalculations that result in oversubscribing or
undersubscribing searches on the cluster.
This can lead to miscalculations due to network latency issues, because the
captain must rely on each member to inform it of any ad hoc searches that it is
running. If members are slow in responding to the captain, the captain might not
be aware of some ad hoc searches, and thus oversubscribe the cluster.
For these reasons, you might find that your needs are better met by using the
member-by-member enforcement method.
KV store can reside on a search head cluster. However, the search head cluster
does not coordinate replication of KV store data or otherwise involve itself in the
operation of KV store. For information on KV store, see About KV store in the
Admin Manual.
52
Deploy search head clustering
These are the main issues to note regarding provisioning of cluster members:
• Each member must run on its own machine or virtual machine, and all
machines must run the same operating system.
• All members must run on the same version of Splunk Enterprise.
• All members must be connected over a high-speed network.
• You must deploy at least as many members as either the replication factor
or three, whichever is greater.
See the remainder of this topic for details on these and other issues.
Each member must run on its own, separate machine or virtual machine.
The hardware requirements for the machine are essentially the same as for any
Splunk Enterprise search head. See Reference hardware in the Capacity
Planning Manual. The main difference is the need for increased storage to
accommodate a larger dispatch directory. See Storage considerations.
53
captain assigns scheduled jobs to members based on their current job loads.
When it does this, it does not have insight into the actual processing power of
each member's machine. Instead, it assumes that each machine is provisioned
equally.
All search head cluster members and the deployer must run on the same
operating system.
If the search head cluster is connected to an indexer cluster, then the indexer
cluster instances must run on the same operating system as the search head
cluster members.
Storage considerations
When determining the storage requirements for your clustered search heads, you
need to consider the increased capacity necessary to handle replicated copies of
search artifacts.
For the purpose of developing storage estimates, you can observe the size over
time of dispatch directories on the search heads in your non-clustered
environment, if any, before you migrate to a cluster. Total up the size of dispatch
directories across all the non-clustered search heads and then make adjustments
to account for the cluster-specific factors.
The most important factor to take into consideration is the replication factor. For
example, if you have a replication factor of 3, you will need approximately triple
the amount of the total pre-cluster storage, distributed equally among the cluster
members.
Other factors can further increase the cluster storage needs. One key factor is
the need to plan for node failure. If a member goes down, causing its set of
artifacts (original and replicated) to disappear from the cluster, fix-up activities
take place to ensure that each artifact once again has its full complement of
copies, matching the replication factor. During fix-up, the copies that were
resident on the failed member get replicated among the remaining members,
increasing the size of each remaining member's dispatch directory.
54
Other issues can also increase storage on a per-member basis. For example, the
cluster does not guarantee an absolutely equal distribution of replicated copies
across the members. In addition, the cluster can hold more than the replication
factor number of some search artifacts. See How the cluster handles search
artifacts.
As a best practice, equip each member machine with substantially more storage
than the estimated need. This allows both for future growth and for temporarily
increased need resulting from downed cluster members. The cluster will stop
running searches if any of its members runs out of disk space.
You can implement search head clustering on any group of Splunk Enterprise
instances, version 6.2 or above.
All cluster members must run the same version of Splunk Enterprise, down to the
maintenance level. You must upgrade all members to a new release at the same
time. You cannot, for example, run a search head cluster with some members at
6.3.2 and others at 6.3.1.
The deployer must run the same version as the cluster members, down to the
minor level. In other words, if the members are running 6.3.2, the deployer must
run some version of 6.3.x. It is strongly advised that you upgrade the deployer at
the same time that you upgrade the cluster members. See Upgrade a search
head cluster.
Note: During search head cluster upgrades, the cluster can temporarily include
both members at the previous version and members at the new version. By the
end of the upgrade process, all members must again run the same version. This
is valid only when upgrading from version 6.4 or later. See Upgrade a search
head cluster.
7.x search head clusters can run against 5.x, 6.x, or 7.x search peers. The
search head cluster members must be at the same or a higher level than the
search peers. For details on version compatibility between search heads and
search peers, see Version compatibility.
55
Licensing requirements
Licensing needs are the same as for any search head. See Licenses and
distributed deployments in the Admin Manual.
The cluster must contain at a minimum the number of members needed to fulfill
both of these requirements:
For example, if your replication factor is either 2 or 3, you need at least three
instances. If your replication factor is 5, you need at least five instances.
You can optionally add more members to boost search and user capacity.
When deploying the cluster across multiple sites, put a majority of the cluster
members on the site that you consider primary. This ensures that the cluster can
continue to elect a captain, and thus continue to function, as long as the primary
site is running. See Deploy a search head cluster in a multisite environment.
A cluster member cannot be the search peer of another search head. For the
recommended approach to accessing cluster member data, see Best practice:
Forward search head data to the indexer layer.
56
Network requirements
Network provisioning
All members must reside on a high speed network where each member can
access every other member.
The members do not necessarily need to be on the same subnet, or even in the
same data center, if you have a fast connection between the data centers. You
can adjust the various search head clustering timeout settings in server.conf. For
help in configuring timeout settings, contact Splunk Professional Services.
• The management port (by default, 8089) must be available to all other
members.
• The http port (by default, 8000) must be available to any browsers
accessing data from the member.
• The KV store port (by default, 8191) must be available to all other
members. You can use the CLI command splunk show kvstore-port to
identify the port number.
• The replication port must be available to all other members.
Caution: Do not change the management port on any of the members while they
are participating in the cluster. If you need to change the management port, you
must first remove the member from the cluster.
It is important that you synchronize the system clocks on all machines, virtual or
physical, that are running Splunk Enterprise instances participating in distributed
search. Specifically, this means your cluster members and search peers.
Otherwise, various issues can arise, such as search failures, premature
expiration of search artifacts, or problems with alerts.
The synchronization method you use depends on your specific set of machines.
Consult the system documentation for the particular machines and operating
systems on which you are running Splunk Enterprise. For most environments,
Network Time Protocol (NTP) is the best approach.
57
Deployer requirements
You need a Splunk Enterprise instance that functions as the deployer. The
deployer updates member configurations. See Use the deployer to distribute
apps and configuration updates.
Deployer functionality is only for use with search head clustering, but it is built
into all Splunk Enterprise instances running version 6.2 or above. The processing
requirements for a deployer are fairly light, so you can usually co-locate deployer
functionality on an instance performing some other function. You have several
options as to the instance on which you run the deployer:
• If you are running an indexer cluster, you might be able to run the
deployer on the same instance as the indexer cluster's master node.
Whether this option is available to you depends on the master's load. See
Additional roles for the master node in Managing Indexers and Clusters of
Indexers for information on cluster master load limits.
• If you have a monitoring console, you can run the deployer on the same
instance as the console. See Which instance should host the console? in
Monitoring Splunk Enteprise.
• You can run the deployer on the same instance as a license master. See
Configure a license master in the Admin Manual.
A deployer can service only a single search head cluster. If you have multiple
clusters, you must use a separate deployer for each one. The deployers must run
on separate instances.
58
Manual.
Other considerations
You cannot enable search head clustering on an instance that is part of a search
head pool. For information on migrating, see Migrate from a search head pool to
a search head cluster.
One cluster member has the role of captain, which means that it coordinates job
and replication activities among all the members. It also serves as a search head
like any other member, running search jobs, serving results, and so on. Over
time, the role of captain can shift among the cluster members.
In addition to the set of search head members that constitute the actual cluster, a
functioning cluster requires several other components:
59
however, under some circumstances, reside on the same instance as
other Splunk Enterprise components, such as a deployment server or an
indexer cluster master node.
• Search peers. These are the indexers that cluster members run their
searches across. The search peers can be either independent indexers or
nodes in an indexer cluster.
• Load balancer. This is third-party software or hardware optionally residing
between the users and the cluster members. With a load balancer in
place, users can access the set of search heads through a single
interface, without needing to specify a particular one.
This topic focuses on setting up the cluster members and the deployer. Other
topics in this chapter describe how to configure search peers, connect with an
indexer cluster, and add a load balancer.
60
5. Bring up the cluster captain.
a. Determine the cluster size, that is, the number of search heads that you want
to include in it. It usually makes sense to put all your search heads in a single
cluster. Factors that influence cluster size include the anticipated search load and
number of concurrent users, and your availability and failover needs. See "About
search head clustering".
b. Decide what replication factor you want to implement. The replication factor
is the number of copies of search artifacts that the cluster maintains. Your
optimal replication factor depends on factors specific to your environment, but
essentially involves a trade-off between failure tolerance and storage capacity. A
higher replication factor means that more copies of the search artifacts will reside
on more cluster members, so your cluster can tolerate more member failures
without needing to use a proxy to access the artifacts. But it also means that you
will need more storage to handle the additional copies. See "Choose the
replication factor for the search head cluster."
c. Determine whether the search head cluster will be running against a group of
standalone indexers or an indexer cluster. For information on indexer clusters,
see "About indexer clusters and index replication" in the Managing Indexers and
Clusters of Indexers manual.
d. Study the topic "System requirements and other deployment considerations for
search head clusters" for information on other key issues.
It is recommended that you select the deployer now, as part of cluster set-up,
because you need a deployer in place before you can distribute apps and
updated configurations to the cluster members.
This instance cannot be a member of the search head cluster, but, under some
circumstances, it can be a Splunk Enterprise instance in use for other purposes.
If necessary, install a new Splunk Enterprise instance to serve as the deployer.
See "Deployer requirements".
61
If you have multiple clusters, you must use a separate deployer for each cluster,
unless you are deploying identical configurations across all the clusters. See
"Deploy to multiple clusters."
For information on how to use the deployer to distribute apps to cluster members,
see "Use the deployer to distribute apps and configuration updates."
The deployer uses the security key to authenticate communication with the
cluster members. The cluster members also use it to authenticate with each
other. You must set the key to the same value on all cluster members and the
deployer. You set the key on the cluster members when you initialize them.
To set the key on the deployer, specify the pass4SymmKey attribute in the
[shclustering] stanza of the deployer's server.conf file. For example:
[shclustering]
pass4SymmKey = yoursecuritykey
c. Set the search head cluster label on the deployer.
The search head cluster label is useful for identifying the cluster in the monitoring
console. This parameter is optional, but if you configure it on one member, you
must configure it with the same value on all members, as well as on the deployer.
[shclustering]
shcluster_label = shcluster1
See "Set cluster labels" in Monitoring Splunk Enterprise.
62
3. Install the Splunk Enterprise instances
Install the Splunk Enterprise instances that will serve as cluster members. For
information on the minimum number of members necessary, see "Required
number of instances."
For information on how to install Splunk Enterprise, read the Installation Manual.
Important: You must change the admin password on each instance. The CLI
commands that you use to configure the cluster will not operate on instances with
the default password.
For each instance that you want to include in the cluster, run the splunk init
shcluster-config command and restart the instance:
splunk restart
Note the following:
• This command is only for cluster members. Do not run this command on
the deployer.
• You can only execute this command on an instance that is up and
running.
• The -auth parameter specifies your current login credentials for this
instance. This parameter is required.
• The -mgmt_uri parameter specifies the URI and management port for this
instance. You must use the fully qualified domain name. This parameter is
required.
• The -replication_port parameter specifies the port that the instance
uses to listen for search artifacts streamed from the other cluster
members. You can specify any available, unused port as the replication
port. Do not reuse the instance's management or receiving ports. This
parameter is required.
63
• The -replication_factor parameter determines the number of copies of
each search artifact that the cluster maintains. All cluster members must
use the same replication factor. This parameter is optional. If not explicitly
set, the replication factor defaults to 3.
• The -conf_deploy_fetch_url parameter specifies the URL and
management port for the deployer instance. This parameter is optional
during initialization, but you do need to set it before you can use the
deployer functionality. See "Use the deployer to distribute apps and
configuration updates."
• The -secret parameter specifies the security key that authenticates
communication between the cluster members and between each member
and the deployer. The key must be the same across all cluster members
and the deployer. See "Set a security key for the search head cluster."
Important:
For example:
splunk restart
Caution: To add more members after you bootstrap the captain in step 5, you
must follow the procedures in "Add a cluster member".
a. Select one of the initialized instances to be the first cluster captain. It does not
matter which instance you select for this role.
64
<username>:<password>
Note the following:
a. Connect the search head cluster to search peers. This step is required. It
varies according to whether the search peers reside in an indexer cluster:
b. Add users. This step is required. See "Add users to the search head cluster".
c. Install a load balancer in front of the search heads. This step is optional.
See "Use a load balancer with search head clustering."
65
updates."
To check the overall status of your search head cluster, run this command from
any member:
You can also use the monitoring console to get more information about the status
of the cluster. See Use the monitoring console to view search head cluster status
and troubleshoot issues.
In addition to checking the status of the search head cluster itself, it is also
advisable to check the status of the KV store running on the cluster. Run this
command from any member:
You can integrate search head clusters with either single-site or multisite indexer
clusters.
66
Integrate with a single-site indexer cluster
Configure each search head cluster member as a search head on the indexer
cluster. Use the CLI splunk edit cluster-config command. For example:
splunk restart
You must run this CLI command on each member of the search head cluster.
The secret key that you set here is the indexer cluster secret key (which is stored
in pass4SymmKey under the [clustering] stanza of server.conf), not the search
head cluster secret key (which is stored in pass4SymmKey under the
[shclustering] stanza of server.conf).
For a search head cluster to serve as the search tier of an indexer cluster, you
must set both types of keys on each of the search head cluster members,
67
because the members are serving both as nodes of the indexer cluster and as
members of the search head cluster. Presumably, if you have already set up the
search head cluster, you have set the search head cluster key before you get to
this step.
Each key type must be identical on all nodes of its respective cluster. That is, the
indexer cluster key must be identical on all nodes of the indexer cluster, while the
search head cluster key must be identical on all search cluster members. It is
recommended, however, that the indexer cluster key be different from the search
head cluster key.
This is all you need for the basic configuration. The search heads now run their
searches against the peer nodes in the indexer cluster.
In a multisite indexer cluster, each search head and indexer has an assigned
site. Multisite indexer clustering promotes disaster recovery, because data is
allocated across multiple sites. For example, you might configure two sites, one
in Boston and another in New York. If one site fails, the data remains accessible
through the other site. See Multisite indexer clusters in Managing Indexers and
Clusters of Indexers.
Configure members
The only difference from a single-site indexer cluster is that you must also specify
the site for each member. This should ordinarily be "site0", so that all search
heads in the cluster perform their searches across the same set of indexers. For
example:
splunk restart
68
Migrate members from a single-site indexer cluster to a multisite indexer
cluster
If the search head cluster members are already integrated into a single-site
indexer cluster and you want to migrate that cluster to multisite, you must edit
each search head's configuration to identify its site.
On each search head, specify its master node and its site. For example:
How the search heads find out about their search peers depends on whether the
search head cluster is part of an indexer cluster. There are two scenarios to
consider:
69
Search head cluster with indexer cluster
If the search head cluster is connected to an indexer cluster, the master node on
the indexer cluster provides the search heads with a list of peer nodes to search
against.
Once you configure the search head cluster members so that they participate in
the indexer cluster, you do not need to perform any further configuration for the
search heads to know their search peers. See Integrate the search head cluster
with an indexer cluster.
Even if you do not need the benefits of index replication, you can still take
advantage of this simple approach to configuring the set of search peers. Just
incorporate your set of indexers into an indexer cluster with a replication factor of
1. This topology also provides numerous other benefits from a management
perspective. See Use indexer clusters to scale indexing in the Managing
Indexers and Clusters of Indexers manual.
Before Splunk Enterprise 6.4, only the first method was available. You had to add
the search peers to each individual member. Starting with 6.4, you can add the
search peers to just a single member and let the cluster replicate the peer
configurations to the other members.
The main circumstance where you might prefer to add peers to individual
members is if you already have a cluster and you have automated the process of
adding search peers to each member.
70
You can switch to the replication method at any time. Peers already added
individually will remain in the configuration. If you add a new member later, it will
get the full set of peers, no matter how they were originally added to the cluster.
Note: The replication method does not use the configuration replication method
described in Configuration updates that the cluster replicates. Instead, it uses a
Raft state machine to replicate the changes to all active members. With this
method, all active members receive the add request at the same time, ensuring
that all members gain access to the same set of search peers.
[raft_statemachine]
disabled = false
replicate_search_peers = true
2. Restart each search head cluster member.
3. Use the CLI to add the search peers to one member. It does not matter which
member you perform this on.
On one member, run the following command, one time for each search peer:
• <scheme> is the URI scheme for accessing the search peer: "http" or
"https".
• <host> is the host name or IP address of the search peer's host machine.
• <port> is the management port of the search peer.
• -auth provides credentials for the member.
• -remoteUsername and -remotePassword provide credentials for the search
peer. The remote credentials must be for an admin-level user on the
search peer.
For example:
71
splunk add search-server https://192.168.1.1:8089 -auth admin:password
-remoteUsername admin -remotePassword passremote
When you add a search peer to one cluster member, the cluster quickly
replicates the operation to the other members. The members will then commit the
change together.
4. Repeat the splunk add search-server command for each search peer.
Note: You can also use replication to remove search peers from the cluster
members. See Remove a search peer via the CLI.
To add the search peers individually to each search head, use the CLI. On each
search head, invoke the splunk add search-server command for each search
peer that you want to add:
Caution: All search heads must use the same set of search peers.
In addition to the CLI, you can add search peers through Splunk Web:
1. Unhide the hidden settings on the search head, as described in The Settings
menu.
If you have enabled search peer replication, you add the search peers to only
one of the cluster members. If you have not enabled search peer replication, you
72
must add them to each cluster member.
If you are not using search peer replication, you can add search peers by directly
editing distsearch.conf and distributing the configuration file via the deployer.
This method requires that you also manually distribute the key file from each
search head to each search peer. See Edit distsearch.conf.
Because of the need to manually distribute key files, this method is not
compatible with search peer replication.
It is considered a best practice to forward all search head internal data to the
search peer (indexer) layer. After you connect the search heads to the search
peers, follow the instructions in Best practice: Forward search head data to the
indexer layer.
To add users to the search head cluster, you can use any of the available
authentication methods: Splunk Enterprise built-in authentication, LDAP, SAML,
or scripted authentication. See the chapters on authentication in the Securing
Splunk Enterprise manual for details.
For Splunk Enterprise built-in authentication, you can use Splunk Web or the CLI
to add users and map roles. Perform the operation on any one of the cluster
members. The cluster then automatically distributes the changes to all members
by replicating the $SPLUNK_HOME/etc/passwd file.
73
Authentication restrictions
Search head clustering does have a few restrictions regarding how you configure
authentication:
• Even when you configure authentication through Splunk Web, the CLI, or
REST endpoints, the cluster only replicates the underlying configuration
files, plus the $SPLUNK_HOME/etc/passwd file in the case of built-in
authentication. If the authentication method that you are employing
requires any other associated, non-configuration files, you must use the
deployer to distribute them to the cluster members. For example:
◊ For SAML, you must use the deployer to push the certificates.
◊ For scripted authentication, you must use the deployer to push the
script. You must also use the deployer to push
authentication.conf, because you can only configure scripted
authentication by editing authentication.conf directly.
To push arbitrary groups of files, such as SAML certificates, from the deployer,
you create an app directory specifically to contain those files.
For details on how to use the deployer to push files, see "Use the deployer to
distribute apps and configuration updates."
There are a variety of third-party load balancers available that you can use for
this purpose. Select a load balancer that employs layer-7 (application-level)
74
processing.
Configure the load balancer so that user sessions are "sticky" or "persistent."
This ensures that the user remains on a single search head throughout their
session.
There are no restrictions on where your cluster members can reside. In cases of
high network latency between sites, however, you might notice some slowness in
UI responsiveness.
The amount of data that cluster members transfer to each other across the
network is difficult to quantify, being dependent on a variety of factors, such as
the number of users, the amount of user activity, the number and types of
searches being run, and so on.
You can integrate the search head cluster members into a multisite indexer
cluster. A multisite indexer cluster confers important advantages on your
deployment. Most importantly, it enhances the high availability and disaster
recoverability of your deployment. See "Multisite indexer clusters" in the
Managing Indexers and Clusters manual.
To integrate a search head cluster with a multisite indexer cluster, configure each
member as a search head in the multisite cluster. See "Integrate with a multisite
indexer cluster."
It is recommended that you set each search head's site attribute to "site0", to
disable search affinity. When search affinity is disabled, the search head runs
its searches across indexers spanning all sites. Barring any change in the set of
available indexers, the search head will run its searches across the same set of
primary bucket copies each time.
75
By setting all search heads to "site0", you ensure a seamless experience for end
users, because the same set of primary bucket copies is used by all search
heads. If, instead, you set different search heads to different sites, the end user
might notice lag time in getting some results, depending on which search head
happens to run a particular search.
If you have an overriding need for search affinity, you can assign the search
heads to specific sites.
Site awareness is less critical for a search head cluster than an indexer cluster. If
a search head cluster member is missing a replicated copy of a search artifact,
the cluster proxies it from another member, which could reside on the same site
or on another site. See "How the cluster handles search artifacts." Even in the
case of a site failure that results in the loss of all copies of some search artifacts,
this is a manageable situation that you can recover from by rerunning searches
and so on.
Note: There are ways that you can work around the lack of site awareness, if
necessary. For example, if your search head cluster consists of four search
heads divided evenly between two sites, you can set the replication factor to 3
and thus ensure that each site has at least one copy of each search artifact.
The choices you make when deploying a search head cluster across multiple
sites can have significant implications for these failure scenarios:
• Site failure
• Network interruptions
In particular, in the case of a two-site cluster, you should put the majority of your
members on the site that you consider primary.
76
Why the majority of members should be on the primary site
If you are deploying the cluster across two sites, put a majority of the cluster
members on the site that you consider primary. This ensures that the cluster can
continue to function as long as that site is running.
Under certain circumstances, such as when a member leaves or joins the cluster,
the cluster holds an election in which it chooses a new captain. The success of
this election process requires that a majority of all cluster members agree on the
new captain. Therefore, the proper functioning of the cluster requires that a
majority of members be running at all times. See "Captain election."
In the case of a cluster running across two sites, if one site fails, the remaining
site can elect a new captain only if it holds a majority of members. Similarly, if
there is a network disruption between the sites, only the site with a majority can
elect a new captain. By assigning the majority of members to your primary site,
you maximize its availability.
If the site with a majority of members fails, the remaining members on the
minority site cannot elect a new captain. Captain election requires the vote of a
majority of members, but only a minority of members are running. The cluster
does not function. See "Consequences of a non-functioning cluster."
To remediate this situation, you can temporarily deploy a static captain on the
minority site. Once the majority site returns, you should revert the minority site to
the dynamic captain. See "Use static captain to recover from loss of majority."
If the network between sites fails, the members on each site will attempt to elect
a captain. However, only a site that holds a majority of the total members will
succeed. That site can continue to function as the cluster indefinitely.
During this time, the members on the other sites can continue to function as
independent search heads. However, they will only be able to service ad hoc
searches. Scheduled reports and alerts will not run, because, in a cluster, the
scheduling function is relegated to the captain.
When the other sites reconnect to the majority site, their members will rejoin the
cluster. For details on what happens when a member rejoins the cluster, see
"When the member rejoins the cluster."
77
Clusters with more than two sites
If there are more than two sites, the cluster can function only if a majority of
members across the sites are still able to communicate and elect a captain. For
example, if you have site1 with five members, site2 with eight members, and
site3 with four members, the cluster can survive the loss of any one site, because
you will still have a majority of members (at least nine) among the remaining two
sites. However, if you have site1 with six members, site2 with two members, and
site3 with three members, the cluster can only function as long as site1 remains
alive, because you need at least six members to constitute a majority.
In both cases, you copy the relevant directories from the search head pool
shared storage to the search head cluster's deployer. You then use the deployer
to propagate these directories to the cluster.
The deployer pushes the configurations to the cluster, using a different method
for each type. Post-migration, the app configurations obey different rules from the
user configurations.
For information on where deployed settings reside on the cluster members, see
"Where deployed configurations live on the cluster members."
78
Custom app configurations
When it migrates an app's custom settings, the deployer places them in default
directories on the cluster members. This includes any runtime changes that were
made while the apps were running on the search head pool.
Because users cannot change settings in default directories, this means that
users cannot perform certain runtime operations on these migrated entities:
Cluster users can override existing attributes by editing entities in place. Runtime
changes get put in the local directories on the cluster members. Local directories
override default directories, so the changes override the default settings.
The deployer copies user configurations to the captain only. The captain then
replicates the settings to all the cluster members through its normal method for
replicating configurations, as described in "Configuration updates that the cluster
replicates."
Unlike custom app configurations, the user configurations reside in the normal
user locations on the cluster members and can later be deleted, moved, and so
on. They behave just like any runtime settings created by cluster users through
Splunk Web.
When you migrate user configurations to an existing search head cluster, the
deployer respects attributes that already exist on the cluster. It does not overwrite
any existing attributes within existing stanzas.
[my search]
search = index=_internal | head 1
and on the deployer, there's the file
$SPLUNK_HOME/etc/shcluster/users/admin/search/local/savedsearches.conf
79
with these stanzas:
[my search]
search = index=_internal | head 10
enableSched = 1
[my search]
search = index=_internal | head 1
enableSched = 1
Note: Splunk does not support migration of per-user search history files.
When you migrate apps to the search head cluster, do not migrate any default
apps, that is, apps that ship with Splunk Enterprise, such as the search app. If
you push default apps to cluster members, you overwrite the version of those
apps residing on the members, and you do not want to do this.
You can, however, migrate custom settings from a default app by moving them to
a new app and exporting them globally.
Each of the migration procedures in this topic includes a step for migrating
default app custom settings.
To migrate settings from a search head pool to a new search head cluster:
1. Follow the procedure for deploying any new search head cluster. Specify the
deployer location at the time that you initialize the cluster members. See "Deploy
80
a search head cluster."
Caution: You must deploy new instances. You cannot reuse existing search
heads.
2. Copy the etc/apps and etc/users directories on the shared storage location in
the search head pool to the distribution directory on the deployer instance. The
distribution directory is located at $SPLUNK_HOME/etc/shcluster.
For details on the distribution directory file structure, see "Where to place the
configuration bundle on the deployer."
3. If you want to migrate custom settings from a default app, you can move them
to a new app and export them globally. For example, to migrate settings from the
search app :
[]
export=system
81
Note: If you point the cluster members at the same set of search peers
previously used by the search head pool, the cluster will need to rebuild any
report acceleration summaries or data model summaries resident on the search
peers. It does this automatically. It does not, however, automatically remove the
old set of summaries.
To migrate settings from a search head pool to an existing search head cluster:
1. Copy the /etc/apps and etc/users directories on the shared storage location
in the search head pool to a temporary directory where you can edit them.
2. If you want to migrate custom settings from a default app, you can move them
to a new app and export them globally. For example, to migrate settings from the
search app :
[]
export=system
• Any default apps, such as the search app. Do not push default apps to the
cluster members. If you do, they will overwrite the versions of those apps
already on the members.
82
4. Copy the remaining subdirectories from the temporary location to the
distribution directory on the deployer, located at $SPLUNK_HOME/etc/shcluster.
Leave any subdirectories already in the distribution directory unchanged.
For details on the distribution directory file structure, see "Where to place the
configuration bundle on the deployer."
Note: If you point the cluster members at the same set of search peers
previously used by the search head pool, the cluster will need to rebuild any
report acceleration summaries or data model summaries resident on the search
peers. It does this automatically. It does not, however, automatically remove the
old set of summaries.
83
You can migrate settings from an existing standalone search head to all
members in a search head cluster.
You cannot migrate the search head instance itself, only its settings. You can
only add clean, new Splunk Enterprise instances to a search head cluster.
In both cases, you copy the relevant directories from the search head to the
search head cluster's deployer. You then use the deployer to propagate these
directories to the cluster.
The deployer pushes the configurations to the cluster, using a different method
for each type. Post-migration, the app configurations obey different rules from the
user configurations.
For information on where deployed settings reside on the cluster members, see
"Where deployed configurations live on the cluster members."
When it migrates an app's custom settings, the deployer places them in default
directories on the cluster members. This includes any runtime changes that were
made while the apps were running on the standalone search head.
Because users cannot change settings in default directories, this means that
users cannot perform certain runtime operations on these migrated entities:
Cluster users can override existing attributes by editing entities in place. Runtime
changes get put in the local directories on the cluster members. Local directories
84
override default directories, so the changes override the default settings.
The deployer copies user configurations to the captain only. The captain then
replicates the settings to all the cluster members through its normal method for
replicating configurations, as described in "Configuration updates that the cluster
replicates."
Unlike custom app configurations, the user configurations reside in the normal
user locations on the cluster members and can later be deleted, moved, and so
on. They behave just like any runtime settings created by cluster users through
Splunk Web.
When you migrate user configurations to an existing search head cluster, the
deployer respects attributes that already exist on the cluster. It does not overwrite
any existing attributes within existing stanzas.
[my search]
search = index=_internal | head 1
and on the deployer, there's the file
$SPLUNK_HOME/etc/shcluster/users/admin/search/local/savedsearches.conf
with these stanzas:
[my search]
search = index=_internal | head 10
enableSched = 1
[my search]
search = index=_internal | head 1
enableSched = 1
85
The [my search] stanza, which already existed on the members, keeps the
existing setting for its search attribute, but adds the migrated setting for the
enableSched attribute, because that attribute did not already exist in the stanza.
The [my other search] stanza, which did not already exist on the members, gets
added to the file, along with its search attribute.
Note: Splunk does not support migration of per-user search history files.
When you migrate apps to the search head cluster, do not migrate any default
apps, that is, apps that ship with Splunk Enterprise, such as the search app. If
you push default apps to cluster members, you overwrite the version of those
apps residing on the members, and you do not want to do this.
• You can migrate any private objects associated with default apps. Private
objects are located under the etc/users directory, not under etc/apps.
• You can migrate custom settings in the app itself by moving them to a new
app and exporting them globally. The migration procedure in this topic
includes a step for this.
Note: This procedure assumes that you have already deployed the search head
cluster. See "Deploy a search head cluster."
To migrate settings:
2. If you want to migrate custom settings from a default app, you can move them
to a new app and export them globally. For example, to migrate settings from the
search app :
86
b. Export the settings globally to make them available to all apps,
including the search app. To do this, create a
.../search_migration_app/metadata/local.meta file and populate it with
the following content:
[]
export=system
• Any default apps, such as the search app. Do not push default apps to the
cluster members. If you do, they will overwrite the versions of those apps
already on the members.
4. Copy all the remaining subdirectories from the temporary location to the
distribution directory on the deployer, located at $SPLUNK_HOME/etc/shcluster.
Leave any subdirectories already in the distribution directory unchanged.
For details on the distribution directory file structure, see "Where to place the
configuration bundle on the deployer."
5. If you need to add new cluster members, you must deploy clean instances.
You cannot reuse the existing search head. For information on adding cluster
members, see "Add a cluster member."
Note: If you point the cluster members at the same set of search peers
previously used by the standalone search head, the cluster will need to rebuild
any report acceleration summaries or data model summaries resident on the
87
search peers. It does this automatically. It does not, however, automatically
remove the old set of summaries.
Starting with version 6.5, you can perform a member-by-member upgrade. This
lets you perform a phased upgrade of cluster members that allows the cluster to
continue operating during the upgrade. To use the member-by-member upgrade
process, you must be upgrading from version 6.4 or later.
Starting with version 7.1, you can perform a rolling upgrade. Rolling upgrade lets
you perform a phased upgrade of cluster members with minimal interruption of
ongoing searches. To use rolling upgrade, you must be upgrading from version
7.1 or later. For more information, see Use rolling upgrade.
In a regular offline upgrade, all cluster members are down for the duration of the
upgrade process.
You must perform an offline upgrade when upgrading from version 6.3 or earlier.
• All cluster members must run the same version of Splunk Enterprise
(down to the maintenance level).
• You can run search head cluster members against 5.x or later
non-clustered search peers, so it is not necessary to upgrade standalone
indexers at the same time. See Splunk Enterprise version compatibility.
Steps
88
7. Wait one to two minutes for captain election to complete. The cluster will
then begin functioning.
For a search head cluster that integrates with an indexer cluster, perform a
member-by-member upgrade as part of the tiered upgrade procedure. See
Upgrade each tier separately in Managing Indexers and Clusters of Indexers.
89
3. Upgrade the deployer:
1. Stop the deployer.
2. Upgrade the deployer.
3. Start the deployer.
For detailed instructions on how to perform a rolling upgrade with minimal search
disruption, see Use rolling upgrade.
This restart takes place the first time, post-upgrade, that you run the splunk
apply shcluster-bundle command. The restart only occurs if you had used the
deployer to push user configurations in 6.2.6 or below.
The default behavior for handling user-based and role-based concurrent search
quotas has changed with version 6.5.
In versions 6.3 and 6.4, the default is to enforce the quotas across the set of
cluster members. Starting with 6.5, the default is to enforce the quotas on a
member-by-member basis.
You can change quota enforcement behavior, if necessary. See Job scheduling.
90
Use rolling upgrade
Splunk Enterprise version 7.1.0 and later supports rolling upgrade for search
head clusters. A rolling upgrade performs a phased upgrade of cluster members
with minimal interruption to your ongoing searches. You can use a rolling
upgrade to minimize search disruption when upgrading cluster members to a
new version of Splunk Enterprise.
Review the following requirements and considerations before you initiate a rolling
upgrade:
Hardware or network failures that prevent node shutdown or restart might require
manual intervention.
When you initiate a rolling upgrade, you select a cluster member and put that
member into manual detention. While in manual detention, the member cannot
accept new search jobs, and all in-progress searches try to complete within a
configurable timeout. When all searches are complete, you perform the software
upgrade and bring the member back online. You repeat this process for each
cluster member until the rolling upgrade is complete.
91
• The cluster member waits for in-progress searches to complete, up to a
maximum time set by the user. The default of 180 seconds is enough time
for the majority of searches to complete in most cases.
• Rolling upgrades apply to both historical and real-time searches.
To upgrade a search head cluster with minimal search interruption, perform the
following steps:
On any cluster member, run the splunk show shcluster-status command using
the verbose option to confirm that the cluster is in a healthy state before you
begin the upgrade:
Captain:
decommission_search_jobs_wait_secs : 180
dynamic_captain : 1
elected_captain : Tue Mar 6 23:35:52
2018
id :
FEC6F789-8C30-4174-BF28-674CE4E4FAE2
initialized_flag : 1
label : sh3
max_failures_to_keep_majority : 1
mgmt_uri :
https://sroback180306192122accme_sh3_1:8089
min_peers_joined_flag : 1
rolling_restart : restart
rolling_restart_flag : 0
rolling_upgrade_flag : 0
service_ready_flag : 1
stable_captain : 1
Cluster Master(s):
https://sroback180306192122accme_master1_1:8089 splunk_version:
7.1.0
Members:
sh3
label : sh3
manual_detention : off
92
mgmt_uri :
https://sroback180306192122accme_sh3_1:8089
mgmt_uri_alias :
https://10.0.181.9:8089
out_of_sync_node : 0
preferred_captain : 1
restart_required : 0
splunk_version : 7.1.0
status : Up
sh2
label : sh2
last_conf_replication : Wed Mar 7 05:30:09
2018
manual_detention : off
mgmt_uri :
https://sroback180306192122accme_sh2_1:8089
mgmt_uri_alias :
https://10.0.181.4:8089
out_of_sync_node : 0
preferred_captain : 1
restart_required : 0
splunk_version : 7.1.0
status : Up
sh1
label : sh1
last_conf_replication : Wed Mar 7 05:30:09
2018
manual_detention : off
mgmt_uri :
https://sroback180306192122accme_sh1_1:8089
mgmt_uri_alias :
https://10.0.181.2:8089
out_of_sync_node : 0
preferred_captain : 1
restart_required : 0
splunk_version : 7.1.0
status : Up
The output shows a stable, dynamically elected captain, enough members to
support the replication factor, no out-of-sync nodes, and all members running a
compatible Splunk Enterprise version (7.1.0 or later). This indicates that the
cluster is in a healthy state to perform a rolling upgrade.
For information on health check criteria, see Health check output details.
Health checks do not cover all potential cluster health issues. Checks apply only
to the criteria listed.
93
/services/shcluster/status?advanced=1
For endpoint details, see shcluster/status in the REST API Reference Manual.
Based on the health check results, either fix any issues impacting cluster health
or proceed with caution and continue the upgrade.
/services/shcluster/captain/control/control/upgrade-init
For endpoint details, see shcluster/captain/control/control/upgrade-init in the
REST API Reference Manual.
Select a search head cluster member other than the captain and put that
member into manual detention mode:
servicesNS/admin/search/shcluster/member/control/control/set_manual_detention
\
-d manual_detention=on
For endpoint details, see shcluster/member/control/control/set_manual_detention
in the REST API Reference Manual.
For more information on manual detention mode, see Put a search head into
detention.
Run the following command to confirm that all searches are complete:
94
The following output indicates that all historical and real-time searches are
complete:
active_historical_search_count:0
active_realtime_search_count:0
Or send a GET request to the following endpoint:
/services/shcluster/member/info
For endpoint details, see shcluster/member/info in the REST API Reference
Manual.
Upgrade the search head following the standard Splunk Enterprise upgrade
procedure. See How to upgrade Splunk Enterprise in the Installation Manual.
splunk start
On restart, the first member upgraded is automatically elected as cluster
captain. This captaincy transfer occurs only once during a rolling upgrade.
2. Turn off manual detention mode:
servicesNS/admin/search/shcluster/member/control/control/set_manual_detention
\
-d manual_detention=off
For endpoint details, see
shcluster/member/control/control/set_manual_detention in the REST API
Reference Manual.
After you bring the member back online, check that the cluster is in a healthy
state.
95
splunk show shcluster-status --verbose
Or, use this endpoint to monitor cluster health:
/services/shcluster/status?advanced=1
For endpoint details, see shcluster/status in the REST API Reference Manual.
For information on what determines a healthy search head cluster, see Health
check output details.
Repeat steps 3-7 above until you have upgraded all cluster members.
It is important to make sure that you upgrade the deployer at the same time that
you upgrade the cluster members. The deployer must run the same version as
the cluster members, down to the minor level. For example, if members are
running 7.1.1, the deployer must run 7.1.x.
Run the following CLI command on any search head cluster member.
/services/shcluster/captain/control/control/upgrade-finalize
For endpoint details, see shcluster/captain/control/control/upgrade-finalize in the
REST API Reference Manual.
96
Example upgrade automation script
97
Configure search head clustering
The members store their cluster configurations in their local server.conf files,
located under $SPLUNK_HOME/etc/system/local/. See the server.conf
specification file for details on all available configuration attributes.
Key information
Initialization-time configurations
You can set all essential configurations during the deployment process, when
you initialize each member. These are the key configuration attributes that you
can or must set for each cluster member during initialization:
98
• The cluster's label. See "Deploy a search head cluster".
Caution: It is strongly recommended that you set all these attributes during
initialization and do not later change them. See "Deploy a search head cluster".
The main configuration changes that you can safely perform on your own,
post-initialization, are the ad hoc search settings. There are two of these: one for
specifying whether a particular member should run ad hoc searches only, and
another for specifying whether the member currently functioning as the captain
should run ad hoc searches only. The captain will not assign scheduled searches
to ad hoc members. See "Configure a cluster member to run ad hoc searches
only".
You can also temporarily switch to a static captain, as a work around for
disaster recovery. See "Use static captain to recover from loss of majority."
Caution: Do not edit the id attribute in the [shclustering] stanza. The system
sets it automatically. This attribute must conform to the requirements for a valid
GUID.
You usually set the cluster label with the splunk init command when you
deploy the cluster. If you did not set it during deployment, you can later set it for
the cluster by running this command on any one member:
Note: If you set the label on a cluster member, you must also set it on the
deployer. See "Configure the deployer."
The server.conf attributes for search head clustering must have the same
values across all members, with these exceptions:
99
• mgmt_uri
• adhoc_searchhead
• [replication_port://<port>]
If any configuration values other than these ones vary from member to member,
then the behavior of the cluster will change depending on which member is
currently serving as captain. You do not want that to occur.
Configuration methods
Most of the configuration occurs during initial cluster deployment, through the CLI
splunk init command. To perform further configuration later, you have two
choices:
Caution: You must make the same configuration changes on all members and
then restart them all at approximately the same time. Because of the importance
of maintaining identical settings across all members, do not use the splunk
rolling-restart command to restart, except when changing the
captain_is_adhoc_searchhead attribute, as described in "Configure a cluster
member to run ad hoc searches only". Instead, run the splunk restart
command on each member.
You can use the CLI splunk edit shcluster-config command to make edits to
the [shclustering] stanza in server.conf. Specify each attribute and its
configured value as a key value pair.
100
• You can use this command to edit any attribute in the [shclustering]
stanza except the disabled attribute, which turns search head clustering
on and off.
• You can only use this command on a member that has already been
initialized. For initial configuration, use splunk init shcluster-config.
You can also change attributes by directly editing server.conf. The search head
clustering attributes are located in the [shclustering] stanza, with one
exception: To modify the replication port, use the [replication_port] stanza.
The cluster can tolerate a failure of (replication factor - 1) members without losing
any search artifacts. For example, to ensure that your system can handle the
failure of two members without losing search artifacts, configure a replication
factor of 3. This configuration directs the cluster to store three copies of each
search artifact, with each copy on a different member. If two members go down,
the artifact is still available on a third member.
The default value for the replication factor is 3. This number is sufficient for most
purposes.
Even with a large cluster of, for example, 50 search heads, you do not need a
commensurately large replication factor. As long as you do not lose the
replication factor number of members, at least one copy of each search artifact
still exists somewhere on the cluster and is accessible to all cluster members.
Any search head in the cluster can access any search artifact by proxying from a
search head storing a copy of that artifact. The proxying operation is fast and
unlikely to impede access to search results from any search head.
101
Note: The replication factor determines only the number of copies of search
artifacts that the cluster maintains. It does not affect the replication of runtime
configuration changes, such as new saved searches. Those changes get
replicated to all cluster members by a different process. If you have 50 search
heads, each of those 50 gets a copy of such configuration changes. See
Configuration updates that the cluster replicates.
All cluster members must use the same replication factor. The server.conf
attribute that determines the replication factor is replication_factor.
You specify the replication factor during deployment of the cluster, as part of
member initialization. See Initialize cluster members.
For information on how the cluster replicates search artifacts, see How the
cluster handles search artifacts. That subtopic describes several key points about
artifact replication, among them:
• In some cases, the cluster might replicate more than the replication factor
number of a search artifact.
• Artifact proxying, along with additional replication, occurs if a member
without a copy of the artifact needs access to it.
• If a member goes down, the cluster replaces the artifact copies that were
being stored on that member.
See List search artifacts to learn how to view the set of artifacts in the cluster and
on individual members.
102
For an overview of search head clustering configuration, see "Configure the
search head cluster".
You must set the key to the same value on all search head cluster members and
the deployer.
It is recommended that you set the security key during initial cluster deployment.
See "Deploy a search head cluster".
If you neglected to set the key during deployment, you can set it post-deployment
by configuring the pass4SymmKey attribute in server.conf on each cluster member
and the deployer. Put the attribute under the [shclustering] stanza. For
example:
[shclustering]
pass4SymmKey = yoursecuritykey
You must restart each instance for the key to take effect. For more information on
post-deployment configuration, see "Configuration methods."
You should save a copy of the key in a safe place. Once an instance starts
running, the security key changes from clear text to encrypted form, and it is no
longer recoverable from server.conf. If you later want to add a new member,
you will need to use the clear text version to set the key.
103
Set the security key for a combined search head cluster and
indexer cluster
For information on setting the security key for a combined search head cluster
and indexer cluster, see Integrate the search head cluster with an indexer cluster
in Distributed Search.
104
Update search head cluster members
For a search head cluster to function properly, its members must all use the
same set of search-related configurations. For example, all search heads in the
cluster need access to the same set of saved searches. They must therefore use
the same savedsearches.conf settings.
Members should also use the same set of user-related settings. See "Add users
to the search head cluster."
Apps must also be identical across all search heads in a cluster. An app is
essentially just a set of configurations.
A search head cluster uses two means to ensure that configurations are identical
across its members: automatic replication and the deployer.
105
Replicated changes
In addition, the cluster replicates a few other runtime changes as well, such as
changes to users and roles.
Deployed changes
The cluster does not replicate all configuration changes, but rather only certain
changes, primarily to knowledge objects, made at runtime through Splunk Web,
the CLI, or the REST API. For other configuration changes and additions, you
must explicitly push the changes to all cluster members. You do this through a
special Splunk Enterprise instance called the deployer.
Examples of changes that require use of the deployer include any configuration
files that you edit directly. For example, if you make a change in limits.conf,
you must push the change through the deployer. Similarly, if you directly edit a
knowledge object configuration file, like savedsearches.conf, you must use the
deployer to distribute it to cluster members. In addition, you must use the
deployer to push new or upgraded apps to the cluster members.
You also use the deployer to migrate app and user settings from an existing
search head pool or standalone search head to the search head cluster.
Adding non-clustered search peers (that is, indexers that are not part of an
indexer cluster) to the search head cluster is an example of the type of
configuration change that the cluster does not replicate automatically. At the
same time, however, it might not be convenient to add search peers by using the
deployer to push an updated distsearch.conf, because the deployer will then
initiate a rolling restart of all cluster members.
106
To avoid a restart of cluster members, you can use the CLI splunk add
search-server command to add peers to each cluster member individually. For
details, see "Connect the search heads in clusters to search peers."
Caution: Complete this operation across all cluster members quickly, so that all
members maintain the same set of search peers.
The Settings menu in Splunk Web organizes settings into several groups,
including one called Knowledge, which contains the knowledge object settings.
Search head clustering hides most non-Knowledge groups in each member's
Settings menu by default. For example, it hides settings for data inputs and the
distributed environment. You can unhide the hidden groups, if necessary.
The reason for hiding non-Knowledge settings is that the cluster only replicates
certain setting changes, mainly those in the Knowledge category. If you make a
change on one member to a setting in a non-Knowledge category, the cluster,
with a few exceptions, does not automatically replicate that change to the other
members. This can lead to the members being out of sync with each other.
If you need to access a hidden setting on a member, you can unhide those
settings:
1. Click Settings in the upper right corner of Splunk Web. A list of settings,
mainly limited to the Knowledge group, appears.
2. Click the Show All Settings button at the end of the list. A dialog box reminds
you that hidden settings will not be replicated.
3. To continue, click Show in the dialog box. The full list of settings, dependent
on your role permissions, appears.
The settings are now unhidden for all users with permission to view them;
typically, all admin users. To rehide the settings, you must restart the instance.
107
CLI commands and cluster members
Most general and search-related CLI commands are available for use on cluster
members. If you run the command on one member, the cluster replicates the
resulting configuration changes to the other members.
However, do not run the splunk clean command, in any of its variants, on an
active cluster member. For example, the splunk clean all command should
only be run after a member is removed from the cluster, as that command
deletes the _raft folder, /etc/passwd, and so on. Similarly, if you run splunk
clean userdata on one member, the user data will be cleaned on that member
only. The change will not replicate to the other members, causing user/role
information to differ between members.
For more information on replicated changes, see "Configuration updates that the
cluster replicates."
Note: The cluster replicates configuration changes to all cluster members. The
cluster's replication factor applies only to search artifact replication. See Choose
the replication factor for the search head cluster.
These are the main types of configuration changes that the cluster replicates:
108
• A whitelist determines the specific types of changes that the cluster
replicates.
• Splunk Web
• The Splunk CLI
• The REST API
The cluster does not replicate any configuration changes that you make
manually, such as direct edits to configuration files.
The cluster uses a whitelist to determine what changes to replicate. This whitelist
is configured through the set of conf_replication_include attributes in the
default version of server.conf, located in $SPLUNK_HOME/etc/system/default.
You can add or remove items from that list by editing the members' server.conf
files under $SPLUNK_HOME/etc/system/local. If you change the whitelist, you
must make the same changes on all cluster members.
For a comprehensive list of items in the whitelist, consult the default version of
server.conf. This is the approximate set of whitelisted items:
alert_actions
authentication
authorize
datamodels
event_renderers
eventtypes
fields
html
literals
lookups
109
macros
manager
models
multikv
nav
panels
passwd
passwords
props
quickstart
savedsearches
searchbnf
searchscripts
segmenters
tags
times
transforms
transactiontypes
ui-prefs
user-prefs
views
viewstates
workflow_actions
The cluster replicates changes to all files underlying the whitelist items. In
addition to configuration files themselves, this includes dashboard and nav XML,
lookup table files, data model JSON files, and so on. The cluster also replicates
permissions stored in *.meta files.
These are examples of the types of files replicated for various whitelist items:
Note: The cluster does not replicate user search history. This is reflected in the
default server.conf file, which includes the line,
conf_replication_include.history = false. Changing that value to "true" has
no effect and does not cause the cluster to replicate search history.
110
The changes that the cluster ignores
The cluster ignores configuration changes for any items that are not on the
whitelist. Examples include index-time settings, such as those that define data
inputs or indexes.
In addition, the cluster only replicates changes that are made through Splunk
Web, the Splunk CLI, or the REST API. If you directly edit a configuration file, the
cluster does not replicate it. Instead, you must use the deployer to distribute the
file to all cluster members.
The cluster also does not replicate newly installed or upgraded apps.
Note: The deployer works in concert with cluster replication to migrate user (not
app) configurations to the cluster members. The typical use case for this is to
migrate user settings on an existing search head pool or standalone search head
to the search head cluster. You put the user configurations that you want to
migrate on the deployer. The deployer pushes them to the captain, which then
replicates them to the other cluster members. For details, see User
configurations.
For example, assume a user on one cluster member uses Splunk Web to create
a new field extraction. Splunk Web saves the field extraction in local files on that
member. The member then sends the file changes to the captain. When each
cluster member next contacts the captain, it pulls the changes, along with any
other recent changes, and applies them locally. Within a few seconds, all cluster
members have the new field extraction.
Note: Files replicated and updated this way are semantically and functionally
equivalent across the set of cluster members. The files might not be identical on
all members, however. For example, depending on circumstances such as the
order in which changes reach the captain, it is possible that an updated setting in
111
props.conf could appear in different locations within the file on different
members.
• Each active cluster member contacts the captain every five seconds and
pulls any changes that have arrived since the last time it pulled changes.
• When a new member joins the cluster, it contacts the captain and
downloads a tarball containing the current set of replicated configurations,
including all changes that have been made over the life of the cluster. It
applies the tarball locally.
• When a member rejoins the cluster. First, follow the procedure outlined
in Add a member that was previously removed from the cluster, cleaning
the instance before you re-add it to the cluster. The member then contacts
the captain and downloads the tarball, the same way that a new member
does.
To see when the members last pulled a set of configuration changes from the
captain, run the splunk show shcluster-status command from any member:
112
The output from this command includes, for each member, the field
last_conf_replication. It indicates the last time that the member successfully
pulled an updated set of configurations from the captain.
Certain conditions can cause a member's baseline to get out-of-sync with the
captain's baseline, and thus with the other members's baseline. In particular, a
member can be out-of-sync when recovering from a loss of connectivity with the
cluster. To remediate this situation, the member must resync with the cluster.
When a member rejoins the cluster, it must resync its baseline with the captain's
baseline. Until the process is complete, the member is considered to be
out-of-sync with the cluster.
To resync its baseline, the member contacts the captain to request the set of
intervening replicated changes. What happens next depends on whether the
member and the captain still share a common commit in their replication change
histories:
• If the captain and the member share a common commit, the member
automatically downloads the intervening changes from the captain and
applies them to its pre-offline configuration. The member also pushes its
intervening changes, if any, to the captain, which replicates them to the
other members. In this way, the member resyncs its baseline with the
captain's baseline.
• If the captain and the member do not share a common commit, they
cannot properly sync without manual intervention. To update the
member's configuration, you must instruct the member to download the
entire configuration tarball from the captain, as described in Perform a
manual resync. The tarball overwrites the member's existing set of
configurations, causing it to lose any local changes that occurred during
the time that it was disconnected from the cluster.
113
Why a recovering member might need to resync manually
If the captain and the member do not share a common commit in their set of
configuration changes, they cannot sync without manual intervention.
If the recovering member has been disconnected from the cluster for so long that
the cluster has purged some intervening change history, the recovering member
will not share a common commit with the captain and therefore cannot apply the
full set of intervening changes. Instead, the member must undergo a manual
resync.
At the end of the manual resync process, the member once again shares a
common baseline with the other members. In the process, the member loses any
local changes made during the time that it was disconnected from the cluster. For
this reason, a manual resync is also known as a "destructive resync."
A similar situation can occur if the entire cluster stops functioning for a while, and
the members operate during that time as independent search heads. See
Recovery from a non-functioning cluster.
Upon rejoining the cluster, the member attempts to apply the set of intervening
replicated changes from the captain. If the set exceeds the purge limits and the
member and captain no longer share a common commit, a banner message
appears on the member's UI, with text similar to the following:
If this message appears, it means that the member is unable to update its
configuration through the configuration change delta and must apply the entire
configuration tarball. It does not do this automatically. Instead, it waits for your
intervention.
114
You must then initiate the process of downloading and applying the tarball by
running this CLI command on the member:
When both limits have been exceeded on a member, the member begins to
purge the change history, starting with the oldest changes.
For more information on purge limit attributes, see the server.conf specification
file.
115
Use the deployer to distribute apps and
configuration updates
The deployer is a Splunk Enterprise instance that you use to distribute apps and
certain other configuration updates to search head cluster members. The set of
updates that the deployer distributes is called the configuration bundle.
Caution: You must use the deployer, not the deployment server, to distribute
apps to cluster members. Use of the deployer eliminates the possibility of conflict
with the run-time updates that the cluster replicates automatically by means of
the mechanism described in Configuration updates that the cluster replicates.
For details of your cluster's app deployment process, view the Search Head
Clustering: App Deployment dashboard in the monitoring console. See Use the
monitoring console to view search head cluster status.
• It handles migration of app and user configurations into the search head
cluster from non-cluster instances and search head pools.
• It deploys baseline app configurations to search head cluster members.
• It provides the means to distribute non-replicated, non-runtime
configuration updates to all search head cluster members.
Configurations move in one direction only: from the deployer to the members.
The members never upload configurations to the deployer. It is also unlikely that
you will ever need to force such behavior by manually copying files from the
cluster members to the deployer, because the members continually replicate all
runtime configurations among themselves.
116
Types of updates that the deployer handles
These are the specific types of updates that require the deployer:
Note: You use the deployer to deploy configuration updates only. You cannot
use it for initial configuration of the search head cluster or for version upgrades to
the Splunk Enterprise instances that the members run on.
You do not use the deployer to distribute certain runtime changes from one
cluster member to the other members. These changes are handled automatically
by configuration replication. See How configuration changes propagate across
the search head cluster.
• The deployer does not represent a "single source of truth" for all
configurations in the cluster.
• You cannot use the deployer, by itself, to restore the latest state to cluster
members.
Because of how configuration file precedence works, changes that users make to
apps at runtime get maintained in the apps through subsequent upgrades.
Say, for example, that you deploy the 1.0 version of some app, and then a user
modifies the app's dashboards. When you later deploy the 1.1 version of the app,
the user modifications will persist in the 1.1 version of the app.
117
configurations in the unmodified apps distributed by the deployer. To understand
this issue in detail, read the rest of this topic, as well as the topic Configuration
file precedence in the Admin Manual.
The mechanism for deploying an upgraded version of an app does not recognize
any deleted files or directories except for those residing under the default and
local subdirectories. Therefore, if your custom app contains an additional
directory at the level of default and local, that directory and all its files will persist
from upgrade to upgrade, even if some of the files, or the directory itself, are no
longer present in an upgraded version of the app.
To delete such files or directories, you must delete them manually, directly on the
cluster members.
Once you delete the files or directories from the cluster members, they will not
reappear the next time you deploy an upgrade of the app, assuming that they are
not present in the upgraded app.
The deployer distributes app configurations to the cluster members under these
circumstances:
When you make a change to the set of apps on the deployer and invoke the
splunk apply shcluster-bundle command, the deployer creates new tarballs for
each changed app and then pushes those tarballs to the current members. When
a new member joins or rejoins the cluster, it receives the current set of tarballs.
This method ensures that all members, whether new or current, maintain
identical sets of configurations. For example, if you change an app but do not run
splunk apply shcluster-bundle to push the change to the current set of
members, any joining member also does not receive that change.
118
For more information on how the deployer creates the app tarballs, see What
exactly does the deployer send to the cluster?
The deployer distributes user configurations to the captain only when you invoke
the splunk apply shcluster-bundle command. The captain then replicates
those configurations to the members.
Note: The actions in this subsection are integrated into the procedure for
deploying the search head cluster, described in the topic Deploy a search head
cluster. If you already set up the deployer during initial deployment of the search
head cluster, you can skip this section.
Each search head cluster needs one deployer. The deployer must run on a
Splunk Enterprise instance outside the search head cluster.
The deployer sends the same configuration bundle to all cluster members that it
services. Therefore, if you have multiple search head clusters, you can use the
same deployer for all the clusters only if the clusters employ exactly the same
configurations, apps, and so on.
If you anticipate that your clusters might need different configurations over time,
set up a separate deployer for each cluster.
You must configure the secret key on the deployer and all search head cluster
members. The deployer uses this key to authenticate communication with the
cluster members. To set the key, specify the pass4SymmKey attribute in either the
[general] or the [shclustering] stanza of the deployer's server.conf file. For
example:
119
[shclustering]
pass4SymmKey = yoursecretkey
The key must be the same for all cluster members and the deployer. You can set
the key on the cluster members during initialization.
You must restart the deployer instance for the key to take effect.
The search head cluster label is useful for identifying the cluster in the monitoring
console. This parameter is optional, but if you configure it on one member, you
must configure it with the same value on all members, as well as on the deployer.
[shclustering]
shcluster_label = shcluster1
See Set cluster labels in Monitoring Splunk Enterprise.
Each cluster member needs to know the location of the deployer. Splunk
recommends that you specify the deployer location during member initialization.
See Deploy a search head cluster.
If you do not set the deployer location at initialization time, you must add the
location to each member's server.conf file before using the deployer:
[shclustering]
conf_deploy_fetch_url = <URL>:<management_port>
120
The conf_deploy_fetch_url attribute specifies the URL and management port
for the deployer instance.
If you later add a new member to the cluster, you must set
conf_deploy_fetch_url on the member before adding it to the cluster, so it can
immediately contact the deployer for the current configuration bundle, if any.
The configuration bundle is the set of files that the deployer distributes to the
cluster. It consists of two types of configurations:
• App configurations.
• User configurations.
You determine the contents of the configuration bundle by copying the apps or
other configurations to a location on the deployer.
The deployer pushes the configuration bundle to the cluster, using a different
method depending on whether the configurations are for apps or for users. On
the cluster members, the app configurations obey different rules from the user
configurations. See Where deployed configurations live on the cluster members.
The deployer pushes the configuration bundle to the cluster as a set of tarballs,
one for each app, and one for the entire user directory.
$SPLUNK_HOME/etc/shcluster/
apps/
<app-name>/
<app-name>/
...
users/
Note the following general points:
121
• The configuration bundle must contain at least one subdirectory under
either /apps or /users. The deployer will error out if you attempt to push a
configuration bundle that contains no app or user subdirectories.
• The deployer only pushes the contents of subdirectories under shcluster.
It does not push any standalone files directly under shcluster. For
example, it will not push the file /shcluster/file1. To deploy standalone
files, create a new apps directory under /apps and put the files in the local
subdirectory. For example, put file1 under
$SPLUNK_HOME/etc/shcluster/apps/newapp/local.
• The shcluster location is only for files that you want to distribute to cluster
members. The deployer does not use the files in that directory for its own
configuration needs.
• Caution: Do not use the deployer to push default apps, such as the
search app, to the cluster members. In addition, make sure that no app in
the configuration bundle has the same name as a default app. Otherwise,
it will overwrite that app on the cluster members. For example, if you
create an app called "search" in the configuration bundle, it will overwrite
the default search app when you push it to the cluster members.
• Put each app in its own subdirectory under /apps. You must untar the app.
• For app directories only, all files placed under both default and local
subdirectories get merged into default subdirectories on the members,
post-deployment. See App configurations.
• The configuration bundle must contain all previously pushed apps, as well
as any new ones. If you delete an app from the bundle, the next time you
push the bundle, the app will get deleted from the cluster members.
• To update an app on the cluster members, put the updated version in the
configuration bundle. Simply overwrite the existing version of the app.
• To delete an app that you previously pushed, remove it from the
configuration bundle. When you next push the bundle, each member will
delete it from its own file system. Note: If you need to remove an app,
inspect its app.conf file to make sure that state = enabled. If state =
disabled, the deployer will not remove the app even if you remove it from
the configuration bundle.
• When the deployer pushes the bundle, it pushes the full contents of all
apps that have changed since the last push. Even if the only change to an
app is a single file, it pushes the entire app. If an app has not changed, the
deployer does not push it again.
122
• To push user-specific files, put the files under the /users subdirectories
where you want them to reside on the members.
• The deployer will push the content under /shcluster/users only if the
content includes at least one configuration file. For example, if you place a
private lookup table or view under some user subdirectory, the deployer
will push it only if there is also at least one configuration file somewhere
under /shcluster/users.
• You cannot subsequently delete user settings by deleting the files from the
deployer and then pushing the bundle again. In this respect, user settings
behave differently from app settings.
On the cluster members, the deployed apps and user configurations reside under
$SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users, respectively.
App configurations
When it deploys apps, the deployer places the app configurations in default
directories on the cluster members.
The deployer never deploys files to the members' local app directories,
$SPLUNK_HOME/etc/apps/<app_name>/local. Instead, it deploys both local and
default settings from the configuration bundle to the members' default app
directories, $SPLUNK_HOME/etc/apps/<app_name>/default. This ensures that
deployed settings never overwrite local or replicated runtime settings on the
members. Otherwise, for example, app upgrades would wipe out runtime
changes.
During the staging process that occurs prior to pushing the configuration bundle,
the deployer copies the configuration bundle to a staging area on its file system,
where it merges all settings from files in /shcluster/apps/<appname>/local into
corresponding files in /shcluster/apps/<appname>/default. The deployer then
pushes only the merged default files.
During the merging process, settings from the local directory take precedence
over any corresponding default settings. For example, if you have a
/newapp/local/inputs.conf file, the deployer takes the settings from that file and
merges them with any settings in /newapp/default/inputs.conf. If a particular
attribute is defined in both places, the merged file retains the definition from the
local directory.
123
User configurations
The deployer copies user configurations to the captain only. The captain then
replicates the settings to all the cluster members through its normal method for
replicating configurations, as described in Configuration updates that the cluster
replicates.
Unlike app configurations, the user configurations reside in the normal user
locations on the cluster members, and are not merged into default directories.
They behave just like any runtime settings created by cluster users through
Splunk Web.
When you migrate user configurations to an existing search head cluster, the
deployer respects attributes that already exist on the cluster. It does not overwrite
any existing attributes within existing stanzas.
[my search]
search = index=_internal | head 1
and on the deployer, there's the file
$SPLUNK_HOME/etc/shcluster/users/admin/search/local/savedsearches.conf
with these stanzas:
[my search]
search = index=_internal | head 10
enableSched = 1
[my search]
search = index=_internal | head 1
enableSched = 1
124
search = FOOBAR
The [my search] stanza, which already existed on the members, keeps the
existing setting for its search attribute, but adds the migrated setting for the
enableSched attribute, because that attribute did not already exist in the stanza.
The [my other search] stanza, which did not already exist on the members, gets
added to the file, along with its search attribute.
After you deploy an app to the members, you cannot subsequently delete the
app's baseline knowledge objects through Splunk Web, the CLI, or the REST
API. You also cannot move, share, or unshare those knowledge objects.
This limitation applies only to the app's baseline knowledge objects - those that
were distributed from the deployer to the members. It does not apply to the app's
runtime knowledge objects, if any. For example, if you deploy an app and then
subsequently use Splunk Web to create a new knowledge object in the app, you
can manage that object with Splunk Web or any other of the usual methods.
Note: This condition does not apply to user-level knowledge objects pushed by
the deployer. User-level objects can be managed by all the usual methods.
The limitation on managing baseline knowledge objects is due to the fact that the
deployer moves all local app configurations to the default directories before it
pushes the app to the members. Default configurations cannot be moved or
otherwise managed. On the other hand, any runtime knowledge objects reside in
the app's local directory and therefore can be managed in the normal way. For
more information on where deployed configurations reside, see App
configurations.
125
On the initial push to a set of new members, the deployer distributes the entire
set of app tarballs to each member. On subsequent pushes, it distributes only
new apps or any apps that have changed since the last push. If even a single file
has changed in an app, the deployer redistributes the entire app. It does not
redistribute unchanged apps.
If you change a single file in the users directory, the deployer redeploys the
entire users tarball to the captain. This is because the users directory is typically
modified and redeployed only during upgrade or migration, unlike the apps
directory, which might see regular updates during the lifetime of the cluster.
Caution: If you attempt to push a very large tarball (>200 MB), the operation
might fail due to various timeouts. Delete some of the contents from the tarball's
app, if possible, and try again.
To deploy a configuration bundle, you push the bundle from the deployer to the
cluster members.
• The -target parameter specifies the URI and management port for any
member of the cluster, for example, https://10.0.1.14:8089. You specify
only one cluster member but the deployer pushes to all members. This
parameter is required.
• The -auth parameter specifies credentials for the deployer instance.
126
In response to splunk apply shcluster-bundle, the deployer displays this
message:
Note: You can eliminate the message by appending the flag --answer-yes to the
splunk apply shcluster-bundle command:
The deployer and the cluster members execute the command as follows:
1. The deployer stages the configuration bundle in a separate location on its file
system ($SPLUNK_HOME/var/run/splunk/deploy) and then pushes the app
directories to each cluster member. The configuration bundle typically consists of
several tarballs, one for each app. The deployer pushes only the new or changed
apps.
2. The deployer separately pushes the users tarball to the captain, if any user
configurations have changed since the last push.
3. The captain replicates any changed user configurations to the other cluster
members.
4. Each cluster member applies the app tarballs locally. If a rolling restart is
determined necessary, approximately 10% of the members then restart at a time,
until all have restarted.
127
During a rolling restart, all members, including the current captain, restart.
Restart of the captain triggers the election process, which can result in a new
captain. After the final member restarts, it requires approximately 60 seconds for
the cluster to stabilize. During this interval, error messages might appear. You
can ignore these messages. They should desist after 60 seconds. For more
information on the rolling restart process, see Restart the search head cluster.
You should usually let the cluster automatically trigger any rolling restart, as
necessary. However, if you need to maintain control over the restart process, you
can run a version of splunk apply shcluster-bundle that stops short of the
restart. If you do so, you must later initiate the restart yourself. The configuration
bundle changes will not take effect until the members restart.
Error while deploying apps to first member: Found zero deployable apps
to send; /opt/splunk/etc/shcluster is likely empty; ensure that the
command is being run on the deployer. If intentionally attempting to
remove all apps from the search head cluster use the "force" option.
WARNING: using this option with an empty shcluster directory will delete
all apps previously deployed to the search head cluster; use with
extreme caution!
128
You can override this behavior with the -force true flag:
If you need to remove an app, inspect its app.conf file to make sure that state =
enabled. If state = disabled, the deployer will not remove the app even if you
remove it from the configuration bundle.
By default, only admin users (that is, those with the admin_all_objects
capability) can push the configuration bundle to the cluster members. Depending
on how you manage your deploymernt, you might want to allow users without full
admin privileges to push apps or other configurations to the cluster members.
You can do so by overriding the controlling stanza in the default restmap.conf
file.
The default restmap.conf file includes a stanza that controls the bundle push
process:
[apps-deploy:apps-deploy]
match=/apps/deploy
capability.post=admin_all_objects
authKeyStanza=shclustering
You can change the capability in this stanza to a different one, either an existing
capability or one that you define specifically for the purpose. You can then assign
that capability to a new role, so that users with that role can push the
configuration bundle..
To create a new special-purpose capability and then assign that capability to the
bundle push process:
[capability::conf_bundle_push]
129
[role_deployer_push]
conf_bundle_push=enabled
2. On the deployer, create a new restmap.conf file under
$SPLUNK_HOME/etc/system/local, or edit the file if it already exists at that
location. Change the value of the capability.post setting to the
conf_bundle_push capability. For example:
[apps-deploy:apps-deploy]
match=/apps/deploy
capability.post=conf_bundle_push
authKeyStanza=shclustering
You can now assign the role_deployer_push role to any users that need to push
the bundle.
For more information on capabilities, see the chapter Users and role-based
access control in Securing Splunk Enterprise.
Any app that uses lookup tables typically ships with stubs for the table files. Once
the app is in use on the search head, the tables get populated as an effect of
runtime processes, such as searches. When you later upgrade the app, by
default the populated lookup tables get overwritten by the stub files from the
latest version of the app, causing you to lose the data in the tables.
To avoid this problem, you can stipulate that the stub files in upgraded apps not
overwrite any table files of the same name already on the cluster members. Run
the splunk apply shcluster-bundle command on the deployer, setting the
-preserve-lookups flag to "true":
130
Note: To ensure that a stub persists on members only if there is no existing table
file of the same name already on the members, this feature can temporarily
rename a table file with a .default extension. (So, for example, lookup1.csv
becomes lookup1.csv.default.) Therefore, if you have been manually renaming
table files with a .default extension, you might run into problems when using
this feature. You should contact Support before proceeding.
The deployer distributes the configuration bundle to the cluster members under
these circumstances:
The implications of the deployer being down depend, therefore, on the state of
the cluster members. These are the main cases to consider:
• The deployer is down but the set of cluster members remains stable.
• The deployer is down and a member attempts to join or rejoin the cluster.
The deployer is down but the set of cluster members remains stable
If no member joins or rejoins the cluster while the deployer is down, there are no
important consequences to the functioning of the cluster. All member
configurations remain in sync and the cluster continues to operate normally. The
only consequence is the obvious one, that you cannot push new configurations to
the members during this time.
The deployer is down and a member attempts to join or rejoin the cluster
In the case of a member attempting to join or rejoin the cluster while the deployer
is down, there is the possibility that the apps configuration on that member will be
131
out-of-sync with the apps configuration on the other cluster members:
• A new member will not be able to pull the current set of apps tarballs.
• A member that left the cluster before the deployer failed and rejoined the
cluster after the deployer failed will not be able to pull any updates made
to the apps portion of the bundle during the time that the member was
down and the deployer was still running.
Remediation is two-fold:
1. Prevent any member from joining or rejoining the cluster during deployer
failure, unless you can be certain that the set of configurations on the joining
member is identical to that on the other members (for example, if the rejoining
member went down subsequent to the deployer failure).
d. Push the restored bundle contents to all members by running the splunk
apply shcluster-bundle command.
132
Manage search head clustering
• A new member. In this case, you want to expand the cluster by adding a
new member.
• A member that was previously removed from the cluster. In this case,
you removed the member with the splunk remove command and now
want to add it back.
• A member that left the cluster without being removed from it. This
can happen if, for example, the instance shut down unexpectedly.
This topic treats each of these categories separately through a set of high-level
procedures, each of which references one or more detailed steps.
These procedures are for Splunk Enterprise instances that have not previously
been part of this cluster.
To add a newly installed Splunk Enterprise instance, which has not previously
functioned as a search head:
To add an existing Splunk Enterprise instance, you must first remove any
non-default settings:
1. If the instance was formerly a member of another search head cluster, remove
and disable the member from that cluster before adding it to this cluster. See
"Remove a cluster member."
133
2. Clean the instance to remove any existing configurations that could interfere
with the cluster. See "Clean the instance."
These procedures are for Splunk Enterprise instances that were previously
members of this cluster but were removed from it with the splunk remove
shcluster-member command. See "Remove a cluster member."
1. Clean the instance to remove any existing configurations that could interfere
with the cluster. See "Clean the instance."
1. Clean the instance to remove any existing configurations that could interfere
with the cluster. See "Clean the instance."
Add a member that left the cluster without being removed from
it
A typical reason for a member falling into this category is a temporary failure of
the cluster member.
For members that left the cluster without being explicitly removed from it:
134
2. Depending on how long the member has been down, you might need to run
the splunk resync shcluster-replicated-config command to download the
current set of configurations.
See "Handle failure of a cluster member" for information on the splunk resync
shcluster-replicated-config command, along with a discussion of other issues
related to dealing with a failed member.
Detailed steps
The high-level procedures for adding a cluster member use the detailed steps in
this section. Depending on the particular situation that you are handling, you
might need to use only a subset of these steps. See the high-level procedures,
earlier in this topic, to determine which of these steps your situation requires.
Note: This step is not necessary if you are adding a new instance that contains
only the default set of configurations.
If you are adding an existing instance to the cluster, you must first stop the
instance and run the splunk clean all command:
splunk stop
splunk start
The splunk clean all command deletes configuration updates that could
interfere with the goal of maintaining the necessary identical configurations and
apps across all cluster members. It does not delete any existing settings under
the [shclustering] stanza in server.conf.
Caution: This step deletes most previously configured settings on the instance.
For a discussion of configurations that must be shared by all members, see "How
configuration changes propagate across the search head cluster."
For more information on the splunk clean command, access the online CLI help:
135
Initialize the instance
If the member is new to the cluster, you must initialize it before adding it to the
cluster:
splunk restart
Note the following:
• See "Deploy a search head cluster" for details on the splunk init
shcluster-config command, including the meaning of the various
parameters.
• The conf_deploy_fetch_url parameter specifies the URL and
management port for the deployer instance. You must set it when adding
a new member to an existing cluster, so that the member can immediately
contact the deployer for the latest configuration bundle, if any. See "Use
the deployer to distribute apps and configuration updates."
This step is for new members only. Do not run it on members rejoining the
cluster.
The final step is to add the instance to the cluster. You can run the splunk add
shcluster-member command either on the new member or from any current
member of the cluster. The command requires different parameters depending
on where you run it from.
When running the splunk add command on the new member itself, use this
version of the command:
136
When running the splunk add command from a current cluster member,
use this version of the command:
• new_member_uri is the management URI and port of the new member that
you are adding to the cluster. This parameter must be identical to the
-mgmt_uri value you specified when you initialized this member.
Post-add activity
After the member joins or rejoins the cluster, it applies all replicated and
deployed configuration updates:
See "How configuration changes propagate across the search head cluster."
Important: You must use the procedure documented here to remove a member
from the cluster. Do not just stop the member.
To disable a member so that you can then re-use the instance, you must also run
the splunk disable shcluster-config command.
To rejoin the member to the cluster later, see Add a member that was previously
removed from the cluster. The exact procedure depends on whether you merely
removed the member from the cluster or both removed and disabled the
member.
Caution: Do not stop the member before removing it from the cluster.
137
1. Remove the member.
To run the splunk remove command on the member that you are removing, use
this version:
• mgmt_uri is the management URI of the member being removed from the
cluster.
After removing the member, wait about two minutes for configurations to be
updated across the cluster, and then stop the instance:
splunk stop
By stopping the instance, you prevent error messages about the removed
member from appearing on the captain.
By removing the instance from the search head cluster, you automatically
remove it from the KV store. To confirm that this instance has been removed
from the KV store, run splunk show kvstore-status on any remaining cluster
member. The instance should not appear in the set of results. If it does appear,
there might be problems with the health of your search head cluster.
If you intend to keep the instance alive for use in some other capacity, you must
disable it after you remove it:
138
splunk disable shcluster-config
3. Clean the KVStore:
• You can specify that a particular member run only ad hoc searches at all
times.
• You can specify that a member run only ad hoc searches while it is the
captain.
Note: Although you can specify that a member run only ad hoc searches, you
cannot specify that it run only scheduled searches. Any cluster member can
always run an ad hoc search. You can, of course, prevent user access to a
search head through any number of means.
[shclustering]
adhoc_searchhead = true
You must restart the instance for the change to take effect.
139
Configure the captain to run ad hoc searches only
You can designate the captain member as an ad hoc search head. This prevents
members from running scheduled searches while they are serving as captain, so
that the captain can dedicate its resources to controlling the activities of the
cluster. When the captain role moves to another member, then the previous
captain will resume running scheduled searches and the new captain will now
run ad hoc searches only.
Important: Make this change on all cluster members, so that the behavior is the
same no matter which member is functioning as captain.
[shclustering]
captain_is_adhoc_searchhead = true
You must restart each member for the change to take effect. Unlike most
configuration changes related to search head clustering, you can use the splunk
rolling-restart command to restart all members. See Restart the search head
cluster.
Control captaincy
You have considerable control over which members become captain, through
these methods:
See Search head cluster captain for details on the captain's role in a search head
cluster.
140
Use cases
• You have one member that you want to always use as captain. Or
conversely, you have one member that you never want to be captain.
• You do not want the captain to perform any user-initiated ad hoc jobs. You
can achieve this by designating one specific member as captain and
keeping your third-party load balancer ignorant of that member.
• You want to repair the state of the cluster. A quick way to do this is to
switch to a new captain, because members join a new captain in a clean
state.
The twin tools of preferred captaincy and captaincy transfer give you flexibility
when you need to control captaincy. Although neither one can guarantee that you
always maintain complete control over the location of your captain, they do limit
the likelihood that the captain will reside on a member that is not optimal for your
needs. And captaincy transfer offers the ability to transfer the captain to a new
member as needed.
preferred_captain = true|false
This attribute defaults to true, which means that, by default, all members are
preferred captains.
To limit the likelihood that the cluster will assign captaincy to a particular
member, set that member's preferred_captain attribute to false:
preferred_captain = false
141
The cluster attempts to respect the captaincy preference.
An out-of-sync member is a member that cannot sync its own set of replicated
configurations with the common baseline set of replicated configurations
maintained by the current or most recent captain. You do not want an out-of-sync
member to become captain.
The captain maintains the baseline set of configurations for all members. When a
configuration change occurs on one member, the member sends the change to
the captain, which then replicates the change to all the other members.
Therefore, it is essential that the baseline set of configurations on the captain be
up-to-date.
If a member's set of configurations differs from the captain's baseline set, the
member is considered to be out-of-sync. This can occur, for example, if the
member lost network connectivity with the cluster for an extended period of time.
When the member returns to the cluster, it needs to resync with the baseline set
of configurations. If a large number of configuration changes occurred while the
member was not in contact with the cluster, the resync can require manual
intervention.
142
that gets replicated to all other members. This situation would result in the loss of
configuration changes made on other members.
By default, this attribute is set to true. That is, the cluster attempts to prevent the
member from becoming captain if it is out-of-sync. It is extremely unlikely that
you will need to change this default behavior.
When electing a captain, the cluster considers the out-of-sync state to be more
important the preferred-captain state. That is, if all preferred-captain members
are out-of -sync, the cluster attempts to elect as captain a non-preferred-captain
member, rather than a preferred-captain member that is out-of-sync. Briefly, here
is the order that the cluster uses to determine member eligibility for captain:
Transfer captaincy
The use of captaincy transfer does not interfere with the normal captain election
process, which always proceeds in response to the circumstances described in
Captain election. If an election occurs and results in the captain moving to a
member other than the one you want it to reside on, you can then invoke
captaincy transfer to relocate the captain.
143
Change the captain
• The -mgmt_uri parameter specifies the URI and management port for the
member that you want to transfer captaincy to. You must use the fully
qualified domain name.
• You can run this command from any member. You are not limited to
running it from the current captain or the intended captain.
• You do not need to restart any member after running the command.
To confirm that the captaincy transfer was successful, run the splunk show
shcluster-status command from any member:
You can also transfer captaincy through the search head clustering dashboard in
Settings. See Use the search head clustering dashboard.
144
Captaincy transfer and rolling-restarts
When a failed member restarts and rejoins the cluster, the cluster can frequently
complete the process automatically. In some cases, however, your intervention is
necessary.
If a search head cluster member fails for any reason and leaves the cluster
unexpectedly, the cluster can usually continue to function without interruption:
• The cluster's high availability features ensure that the cluster can continue
to function as long as a majority (at least 51%) of the members are still
running. For example, if you have a cluster configured with seven
members, the cluster will function as long as four or more members
remain up. If a majority of members fail, the cluster cannot successfully
elect a new captain, which results in failure of the entire cluster. See
Search head cluster captain.
• All search artifacts resident on the failed member remain available through
other search heads, as long as the number of machines that fail is less
than the replication factor. If the number of failed members equals or
exceeds the replication factor, it is likely that some search artifacts will no
longer be available to the remaining members.
145
• If the failed member was serving as captain, the remaining nodes elect
another member as captain. Since members share configurations, the
new captain is immediately fully functional.
• If you are employing a load balancer in front of the search heads, the load
balancer should automatically reroute users on the failed member to an
available search head.
• The replicated changes, which it gets from the captain. See Updating the
replicated changes.
• The deployed changes, which it gets from the deployer. See Updating the
deployed changes.
See How configuration changes propagate across the search head cluster for
information on how configurations are shared among cluster members.
When the member rejoins the cluster, it contacts the captain to request the set of
intervening replicated changes. In some cases, the recovering member can
automatically resync with the captain. However, if the member has been
disconnected from the cluster for a long time, the resync process might require
manual intervention.
When the member rejoins the cluster, it automatically contacts the deployer for
the latest configuration bundle. The member then applies any changes or
additions that have been made since it last downloaded the bundle.
146
Use static captain to recover from loss of majority
A cluster normally uses a dynamic captain, which can change over time. The
dynamic captain is chosen by periodic elections, in which a majority of all cluster
members must agree on the captain. See "Captain election."
If a cluster loses the majority of its members, therefore, it cannot elect a captain
and cannot continue to function. You can work around this situation by
reconfiguring the cluster to use a static captain in place of the dynamic captain.
A static captain does not change over time. Unlike a dynamic captain, the cluster
does not conduct an election to select the static captain. Instead, you designate a
member as the static captain, and that member remains the captain until you
designate another member as captain.
The static captain has one fundamental shortcoming: It becomes a single point of
failure for the cluster. If the captain fails, the cluster fails. The cluster cannot, on
its own, replace a static captain. Rather, manual intervention is necessary.
Because of this shortcoming, Splunk recommends that you use the static captain
capability only for disaster recovery. Specifically, you can employ the static
captain to recover from a loss of majority, which renders the cluster incapable of
electing a dynamic captain.
In addition, the static captain does not check whether enough members are
running to meet the replication factor. This means that, under some conditions,
you might not have a full complement of search artifact copies.
Note: You should only employ static captain when absolutely necessary. While
the process of converting to static captain is usually simple and fast, the process
of later reverting back to a dynamic captain is somewhat more involved.
Here are some situations where it makes sense to switch to a static captain:
• A single-site cluster loses the majority of its members. You can revive the
cluster by designating one of its members as a static captain.
147
• The cluster is deployed across two sites. The majority site fails. Without a
majority, the members in the second, minority site cannot elect a captain.
You can revive the cluster by designating one of the members on the
minority site as a static captain.
In all cases, once the precipitating issue has been resolved, you should revert
the cluster to use a dynamic captain.
Caution: Do not use the static captain to handle a network interruption that stops
communication between two sites. During a network interruption, the site with a
majority of members continues to function as usual, because it can elect a
dynamic captain as necessary. However, the site with a minority of members
cannot elect a captain and therefore will not function as a cluster. If you attempt
to revive the minority site by configuring its members to use a static captain, you
will then have two clusters, one with a dynamic captain and the other with a static
captain. When the network heals, you will not be able to reconcile the
configuration changes between the sites.
1. On the member that you want to designate as captain, run this CLI command:
148
You do not need to restart the captain or any other members after running these
commands. The captain immediately takes control of the cluster.
To confirm that the cluster is now operating with a static captain, run this CLI
command from any member:
When the precipitating situation has resolved, you should revert the cluster to
control by a single, dynamic captain. To switch to dynamic captain, you
reconfigure all the members that you previously configured for static captain.
How exactly you do this depends on the type of scenario you are recovering
from.
This topic provides reversion procedures for the two main scenarios:
• Two-site cluster, where the majority site went down and you converted the
members on the minority site to use static captain. Once the majority site
returns, you should convert all members to dynamic.
In the scenario of a single-site cluster with loss of majority, you should revert to
dynamic mode once the cluster regains its majority:
1. As members come back online, convert them one-by-one to point to the static
captain:
149
You do not need to restart the member after running this command.
As you point each rejoining member to the static captain, it attempts to download
the replication delta. If the purge limit has been exceeded, the system will prompt
you to perform a manual resync, as explained in "How the update proceeds."
Caution: During the time that it takes for the remaining steps of this procedure to
complete, your users should not make any configuration changes.
2. Once the cluster has regained its majority, convert all members back to
dynamic captain use. Convert the current, static captain last. To accomplish this,
run this command on each member:
• The -election parameter indicates the type of captain that this cluster
uses. By setting -election to "true", you indicate that the cluster uses a
dynamic captain.
• The -mgmt_uri parameter specifies the URI and management port for this
member instance. You must use the fully qualified domain name. This is
the same value that you specified when you first deployed the member
with the splunk init command.
You do not need to restart the member after running this command.
3. Bootstrap one of the members. This member then becomes the first dynamic
captain. It is recommended that you bootstrap the member that was previously
serving as the static captain.
In the scenario of a two-site cluster with loss of the majority site, you should
revert to dynamic mode once the majority site comes back online:
150
1. When the majority site comes back online, convert its members to use the
static captain. Point each majority site member to the static captain:
You do not need to restart the member after running this command.
As you point each rejoining member to the static captain, it attempts to download
the replication delta. If the purge limit has been exceeded, the system will prompt
you to perform a manual resync, as explained in "How the update proceeds."
2. Wait for all the majority-site members to get the replicated configs from the
static captain. This typically takes a few minutes.
Caution: During the time that it takes for the remaining steps of this procedure to
complete, your users should not make any configuration changes.
3. Convert all members back to dynamic captain use. Convert the current, static
captain last. To accomplish this, run this command on each member:
• The -election parameter indicates the type of captain that this cluster
uses. By setting -election to "true", you indicate that the cluster uses a
dynamic captain.
• The -mgmt_uri parameter specifies the URI and management port for this
member instance. You must use the fully qualified domain name. This is
the same value that you specified when you first deployed the member
with the splunk init command.
You do not need to restart the member after running this command.
4. Bootstrap one of the members. This member then becomes the first dynamic
captain. It is recommended that you bootstrap the member that was previously
serving as the static captain.
151
splunk bootstrap shcluster-captain -servers_list
"<URI>:<management_port>,<URI>:<management_port>,..." -auth
<username>:<password>
For information on these parameters, see "Bring up the cluster captain."
You can put a search head cluster member in detention via the CLI, REST
endpoint, or via the server.conf file.
When you manually put a search head cluster member into the detention state, it
remains in detention until you remove it from detention, and the detention state
persists through a restart.
Use cases
Manual detention is useful for cases where you need a search head to be a
functional member of a cluster, but you need to perform maintenance of some
kind on the search head:
152
completed, the member can be removed from the search head cluster for
maintenance operations like hardware replacement or OS upgrade.
• Search head diagnostics. You can use manual detention to prevent
searches from being sent to a poorly performing search head while you
run diagnostics.
• Searchable rolling restarts. Manual detention is used by default in
searchable rolling restarts. No action is required.
For information on searchable rolling restarts, see Restart the search head
cluster. For information on rolling upgrades, see Use rolling upgrade.
You can run the following CLI command to confirm that all searches are
complete:
active_historical_search_count:0
active_realtime_search_count:0
Or send a GET request against:
/services/shcluster/member/info
153
See the documentation for editing the decommission_search_jobs_wait_secs
attribute in the server.conf files here: search head clustering configuration.
See the documentation for searchable rolling restarts here: How searchable
rolling restart works.
To put a search head cluster member into detention from Splunk Web, complete
the following steps:
Put a search head cluster member into detention via the CLI
To put a search head cluster member into detention, run the CLI command
splunk edit shcluster-config with the -manual_detention parameter.
You can set the -manual_detention parameter to one of the following values:
• on. The search head cluster member enters detention and does not accept
any new searches. It also does not receive replicated search artifacts from
other members of the cluster. The search head continues to perform other
duties associated with search head clustering, such as voting for a
captain.
• off. The search head cluster member accepts new searches, replicates
search artifacts, and performs duties associated with search head
clustering. This is the default setting.
For example:
154
splunk edit shcluster-config -manual_detention off
The search head must be in the "up" state before you put it in detention. Verify
the state of the search head before you attempt to put it in manual detention.
To put a search head cluster member in detention from any other node, run the
following command by specifying the 'target_uri' as an additional parameter to
the CLI. The 'target_uri' is the 'mgmt_uri' of the target node to be put in manual
detention.
Put a search head cluster member into detention via the REST
endpoint
To put a search head into manual detention, you can modify the
manual_detention attribute in the [shclustering] stanza of the search head's
server.conf file. You set the value to on. For example:
[shclustering]
disabled = 0
mgmt_uri = https://tsen-centos62x64-5:8089
id = C09EC4A9-8426-46F3-8385-693998B1EA5E
manual_detention = on
In order for changes to take effect, you must restart the search head cluster
member when you use the server.conf file to put it into detention.
155
See the documentation for cluster configuration in the server.conf files here:
search head clustering configuration.
The deployer also automatically initiates a rolling restart, when necessary, after
distributing a configuration bundle to the members. For details on this process,
see "Push the configuration bundle".
When you initiate a rolling restart, the captain issues a restart message to
approximately 10% (by default) of the members at a time. Once those members
restart and contact the captain, the captain then issues a restart message to
another 10% of the members, and so on, until all the members, including the
captain, have restarted.
If there are fewer than 10 members in the cluster, the captain issues the restart
to one member at a time.
The captain is the final member to restart. After the captain member restarts, it
continues to function as the captain.
After all members have restarted, it requires approximately 60 seconds for the
cluster to stabilize. During this interval, error messages might appear. You can
safely ignore these messages. Error messages will stop within 60 seconds.
156
During a rolling restart, there is no guarantee that all knowledge objects will be
available to all members.
You can initiate a rolling restart from Splunk Web or from the command line.
By default, the captain issues the restart command to 10% of the members at a
time. The restart percentage is configurable through the
percent_peers_to_restart attribute in the [shclustering] stanza of
server.conf. For convenience, you can configure this attribute with the CLI
splunk edit shcluster-config command. For example, to change the restart
behavior so that the captain restarts 20% of the peers at a time, use this
command:
After changing the percent_peers_to_restart attribute, you still need to run the
splunk rolling-restart command to initiate the actual restart.
157
process.
Splunk Enterprise 7.1 and later provides a searchable option for rolling restarts.
The searchable option lets you perform a rolling restart of search head cluster
members with minimal interruption of ongoing searches. You can use searchable
rolling restart to minimize search disruption, when a rolling restart is required due
to regular maintenance or a configuration bundle push.
When you initiate a searchable rolling restart, health checks automatically run to
confirm that the cluster is in a healthy state. If the health checks succeed, the
captain selects a cluster member and puts that member into manual detention.
While in detention, the member stops accepting new search jobs, and waits for
in-progress searches to complete. New searches continue to run on remaining
members in the search head cluster. For more information, see Put a search
head in detention mode.
158
• Health checks automatically run to confirm that the cluster is in a healthy
state before the rolling restart begins.
• While in manual detention, a member:
♦ cannot receive new searches (new scheduled searches are
executed on other members).
♦ cannot execute ad hoc searches.
♦ cannot receive new search artifacts from other members.
♦ continues to participate in cluster operations.
• The member waits for any ongoing searches to complete, up to a
maximum time, as determined by the
decommission_search_jobs_wait_secs attribute in server.conf. The
default setting of 180secs covers the majority of searches in most cases.
You can adjust this setting based on the average search runtime.
• Searchable rolling restart applies to both historical and real-time searches.
You can initiate a searchable rolling restart from Splunk or from the command
line.
159
5. (Optional) The searchable option automatically runs cluster health checks.
To override health check failures and proceed with the searchable rolling
restart, select the Force option.
Use the Force option with caution. This option can impact searches.
6. Click Restart.
This initiates the searchable rolling restart.
You can use the splunk show shcluster-status command with the verbose
option to view information about the health of the search head cluster. This can
help you determine if the cluster is in an appropriately healthy state to initiate a
searchable rolling restart.
It is not mandatory to run a health check before you initiate a searchable rolling
restart. Searchable rolling restart automatically runs a health check when
initiated.
To view information about the health of the cluster, run the following command
on any cluster member:
160
splunk show shcluster-status --verbose
Here is an example of the output from the above command:
Captain:
decommission_search_jobs_wait_secs : 180
dynamic_captain : 1
elected_captain : Tue Mar 6 23:35:52
2018
id :
FEC6F789-8C30-4174-BF28-674CE4E4FAE2
initialized_flag : 1
label : sh3
max_failures_to_keep_majority : 1
mgmt_uri :
https://sroback180306192122accme_sh3_1:8089
min_peers_joined_flag : 1
rolling_restart : restart
rolling_restart_flag : 0
rolling_upgrade_flag : 0
service_ready_flag : 1
stable_captain : 1
Cluster Master(s):
https://sroback180306192122accme_master1_1:8089 splunk_version:
7.1.0
Members:
sh3
label : sh3
manual_detention : off
mgmt_uri :
https://sroback180306192122accme_sh3_1:8089
mgmt_uri_alias :
https://10.0.181.9:8089
out_of_sync_node : 0
preferred_captain : 1
restart_required : 0
splunk_version : 7.1.0
status : Up
sh2
label : sh2
last_conf_replication : Wed Mar 7 05:30:09
2018
manual_detention : off
mgmt_uri :
https://sroback180306192122accme_sh2_1:8089
mgmt_uri_alias :
https://10.0.181.4:8089
out_of_sync_node : 0
161
preferred_captain : 1
restart_required : 0
splunk_version : 7.1.0
status : Up
sh1
label : sh1
last_conf_replication : Wed Mar 7 05:30:09
2018
manual_detention : off
mgmt_uri :
https://sroback180306192122accme_sh1_1:8089
mgmt_uri_alias :
https://10.0.181.2:8089
out_of_sync_node : 0
preferred_captain : 1
restart_required : 0
splunk_version : 7.1.0
status : Up
The output shows a stable, dynamically elected captain, enough members to
support the replication factor, no out-of-sync nodes, and all members running a
compatible Splunk Enterprise version (7.1.0 or later). This indicates the cluster is
in a healthy state to perform a searchable rolling restart.
The table shows output values for the criteria used to determine the health of the
search head cluster.
Output
Health Check Description
Value
dynamic_captain 1 The cluster has a dynamically elected captain.
The current captain maintains captaincy for at
least 10 heartbeats, based on the
stable_captain 1
elected_captain timestamp. (? 50 secs, but can
vary depending on hearbeat_period)
The cluster has enough members to support the
service_ready_flag 1
replication factor.
out_of_sync 0 No cluster member nodes are out-of-sync.
All cluster members and the indexer cluster
7.1.0 or
splunk_version master are running a compatible Splunk
later.
Enterprise version.
Health checks are not all inclusive. Checks apply only to the criteria listed.
162
2. Initiate a searchable rolling restart
You can use the CLI or REST API to set the rolling_restart attribute in the
shclustering stanza of local/server.conf.
163
When using the CLI or REST API to set rolling restart attributes, a cluster restart
is not required.
Set searchable rolling restart as the default mode for deployer bundle push
Deployer bundle pushes that require a restart use the default rolling_restart
value in server.conf. You can set the rolling_restart value to searchable to
make searchable rolling restart the default mode for all rolling restarts triggered
by a deployer bundle push.
To set searchable rolling restart as the default mode for deployer bundle push,
use the following attributes in the [shclustering] stanza of server.conf:
For more information on deployer bundle push, see Use the deployer to distribute
apps and configuration updates.
164
Monitor the restart process
To check the progress of the rolling restart, run this variant of the splunk
rolling-restart on any cluster member:
However, to deal with catastrophic failure of a search head cluster, such as the
failure of a data center, you can periodically back up the cluster state, so that you
can later restore that state to a new or standby cluster, if necessary.
In addition, to deal with failure of the deployer, you can backup and restore the
deployer's configuration bundle.
165
As with any backup-and-recovery scheme, test that these procedures work for
you before you need them to work for you.
You can restore the settings to either a new or an existing, standby cluster. The
procedure documented here assumes that you are restoring to a standby cluster,
but you can apply the main points of the procedure to a new cluster.
166
• The search head cluster state
All members of both the old and new clusters, along with their deployers, must be
running the same version of Splunk Enterprise, down to the maintenance level.
This procedure assumes that you are restoring to a new deployer. If the old
deployer is intact, you can reuse it by just pointing the new cluster members to it.
A deployer can only service a single cluster. The old cluster must be permanently
inactive before you can use the existing deployer with the new cluster.
1. Confirm that all members of the standby search head cluster are still
stopped.
2. Untar the set of backups to a temporary location.
3. On each standby cluster member:
1. Restore the replicated configurations:
1. Move the replicated bundle $LATEST_TIME-$CHECKSUM.bundle
from the temporary location to $SPLUNK_HOME/etc.
2. Untar $LATEST_TIME-$CHECKSUM.bundle.
167
4. Restore the KV store configurations. Follow the instructions in Restore the
KV store data in the Admin Manual.
5. Restore the search head cluster id field. Edit
$SPLUNK_HOME/etc/system/local/server.conf and change the id setting
in the shclustering stanza to use the value from the backup.
168
Troubleshoot search head clustering
The dashboard provides basic information about the cluster, such as:
• Begin rolling restart. This action initiates a rolling restart of the cluster
members. See Restart the search head cluster.
• Transfer captain. This action is available for each member not currently
the captain. It transfers captaincy to that member. See Transfer captaincy.
Note: To transfer captaincy or perform rolling restart from the dashboard, all
search head cluster members must be at release 6.6 or later.
You can also use the monitoring console to get more information about the
cluster. See "Use the monitoring console to view search head cluster status and
troubleshoot issues."
169
Show cluster status
To check the overall status of your search head cluster, run this command from
any member:
170
splunk list shcluster-config -uri <URI>:<management_port> -auth
<username>:<password>
Note the following:
• The -uri parameter specifies the URI and management port for the
member whose configuration you want to check.
To get a list of all cluster members, run this command from any member:
Note: The command continues to list members that have left the cluster until
captaincy transfers.
To list information about a member, run this command on the member itself:
• The -uri parameter specifies the URI and management port for the
member whose configuration you want to know.
To list the set of artifacts stored on the cluster, run this command on the captain:
171
splunk list shcluster-member-artifacts
List scheduler jobs
To list the set of scheduler jobs, run this command on the captain:
There are several search head clustering dashboards under the Search menu:
View the dashboards themselves for more information. In addition, see Search
head clustering dashboards in Monitoring Splunk Enterprise.
172
Note: You can also use the CLI to get basic information about the cluster. See
Use the CLI to view information about a search head cluster.
As part of its continuous monitoring of the search head cluster, the monitoring
console provides a variety of information useful for troubleshooting. For example:
• The time that the member last sent a heartbeat to the captain
• The time that the captain last received a heartbeat from the member
These times should be the same or nearly the same. Significant differences in
the sent and received times indicate likely problems.
You can also access heartbeat information through the REST API. See the
REST API documentation for shcluster/captain/members/{name}.
173
The role of the heartbeat
• Search artifacts
• Dispatched searches
• Alerts and suppressions
• Completed summarization jobs
• Member load information
When the captain receives the heartbeat, it notes that the member is in the "up"
state.
After the captain receives a heartbeat from every node, it consolidates all the
transmitted information and, in turn, sends members information such as:
The captain expects to get a heartbeat from each member on a regular basis, as
specified in the heartbeat_timeout attribute in the [shclustering] stanza of
server.conf.
The captain only knows about the existence of a member through its heartbeat. If
it never receives a heartbeat, it will not know that the member exists.
If, within the specified timeout period, the captain does not get a heartbeat from a
member that has previously sent a heartbeat, the captain marks the member as
"down". The captain does not dispatch new searches to members in the "down"
174
state.
If the captain does not receive a heartbeat from a member, it usually indicates
one of the following situations:
To find this information, go to the Snapshots section of the dashboard and view
the Status table. There is one row for each member. The table includes two
columns that pertain to baseline consistency:
175
table to identify the member that is not in sync with a majority of the other
members. To restore consistency, perform a manual resync on the member,
using the splunk resync shcluster-replicated-config command. See Perform
a manual resync.
Deployment issues
Crash when adding new member
It is recommended that you always use new instances when adding members to
a cluster, but if you choose to re-use an instance, you must follow the instructions
in "Add a new member."
Runtime considerations
Delays due to coordination between cluster members
Coordination between the captain and other cluster members sometimes creates
latency of up to 1.5 minutes. For example, when you save a search job, Splunk
Web might not update the job's state for a short period of time. Similarly, it can
take a minute or more for the captain to orchestrate the complete deletion of
jobs.
In addition, when an event triggers the election of a new captain, there will be an
interval of one to two minutes while the election completes. During this time,
search heads can service only ad hoc job requests.
The search head cluster can handle approximately 5000 active, unexpired alerts.
To stay within this boundary, use alert throttling or limit alert retention time. See
the Alerting Manual.
176
Site failure can prevent captain election
If the cluster is deployed across two sites and the site with a majority of members
goes down or is otherwise inaccessible, the cluster cannot elect a new captain.
To remediate this situation, you can temporarily deploy a static captain. See
"Use static captain to recover from loss of majority."
If the cluster is unable to elect a captain and maintain a healthy state due to Raft
issues, you can clean the Raft folder on all members and then bootstrap the
cluster. See Fix the entire cluster.
The primary symptom of a Raft issue is that the member's status appears as
"down" when you run splunk show shcluster-status on the captain. To confirm
the Raft issue, look in the member's splunkd.log file for an error message that
starts with the string "ERROR SHCRaftConsensus".
File corruption in a member's _raft folder is a common cause of Raft issues. You
can fix the problem by cleaning the folder on the member. The folder then
repopulates from the captain.
To fix a Raft issue, clean the member's _raft folder. Run the splunk clean raft
command on the member:
splunk stop
2. Clean the member's raft folder:
splunk start
177
The _raft folder will be repopulated from the captain.
If captain election fails even though a majority of members are available, raft
metadata corruption is a likely cause. To confirm, you can examine the members'
splunkd.log files for errors that start with the string "ERROR
SHCRaftConsensus".
You can resolve the issue by cleaning the folder on all members and then
bootstrapping the cluster:
178
Search head pooling
As an alternative, you can deploy search head clustering. See "About search
head clustering".
For a list of all deprecated features, see the topic "Deprecated features" in the
Release Notes.
Important: Search head pooling is an advanced feature. It's recommended that
you contact the Splunk sales team to discuss your deployment before
attempting to implement it.
You can set up multiple search heads so that they share configuration and user
data. This is known as search head pooling. The main reason for having
multiple search heads is to facilitate horizontal scaling when you have large
numbers of users searching across the same data. Search head pooling can also
reduce the impact if a search head becomes unavailable. This diagram provides
an overview of a typical deployment with search head pooling:
179
You enable search head pooling on each search head that you want to be
included in the pool, so that they can share configuration and user data. Once
search head pooling has been enabled, these categories of objects will be
available as common resources across all search heads in the pool:
For example, if you create and save a search on one search head, all the other
search heads in the pool will automatically have access to it.
• Most shared storage solutions don't perform well across a WAN. Since
search head pooling requires low-latency shared storage capable of
serving a high number of operations per second, implementing search
head pooling across a WAN is not supported.
180
• All search heads in a pool must be running the same version of Splunk
Enterprise. Be sure to upgrade all of them at once. See "Upgrade your
distributed Splunk Enterprise deployment" in the Installation Manual.
The set of data that a search head distributes to its search peers is known as the
knowledge bundle. For details, see What search heads send to search peers.
By default, only one search head in a search head pool sends the knowledge
bundle to the set of search peers. This optimization is controllable by means of
the useSHPBundleReplication attribute in distsearch.conf.
See the other topics in this chapter for more information on search head pooling:
Answers
Have questions? Visit Splunk Answers and see what questions and answers the
Splunk community has about search head pooling.
181
Create a search head pool
To create a pool of search heads, follow these steps:
So that each search head in a pool can share configurations and artifacts, they
need to access a common set of files via shared storage:
Important: The Splunk user account needs read/write access to the shared
storage location. When installing a search head on Windows, be sure to install it
as a user with read/write access to shared storage. The Local System user does
not have this access. For more information, see "Choose the user Splunk should
run as" in the Installation manual.
a. Set up each search head individually, specifying the search peers in the usual
fashion. See "Add search peers to the search head".
b. Make sure that each search head has a unique serverName attribute,
configured in server.conf. See "Manage distributed server names" for detailed
information on this requirement. If the search head does not have a unique
182
serverName,
a warning will be generated at start-up. See "Warning about unique
serverName attribute" for details.
Before enabling pooling, you must stop splunkd. Do this for each search head in
the pool.
Use the CLI command splunk pooling enable to enable pooling on a search
head. The command sets certain values in server.conf. It also creates
subdirectories within the shared storage location and validates that Splunk
Enterprise can create and move files within them.
Note:
183
• The --debug parameter causes the command to log additional information
to btool.log.
The command sets values in the [pooling] stanza of the server.conf file in
$SPLUNK_HOME/etc/system/local.
You can also directly edit the [pooling] stanza of server.conf. For detailed
information on server.conf, look here.
Important: The [pooling] stanza must be placed in the server.conf file directly
under $SPLUNK_HOME/etc/system/local/. This means that you cannot deploy the
[pooling] stanza via an app, either on local disk or on shared storage. For
details see the server.conf spec file.
For example, if your NFS mount is at /tmp/nfs, copy the apps subdirectories that
match this pattern:
$SPLUNK_HOME/etc/apps/*
into
/tmp/nfs/etc/apps
/tmp/nfs/etc/apps/search
/tmp/nfs/etc/apps/launcher
/tmp/nfs/etc/apps/unix
[...]
184
$SPLUNK_HOME/etc/users/*
into
/tmp/nfs/etc/users
Important: You can choose to copy over just a subset of apps and user
subdirectories; however, be sure to move them to the precise locations described
above.
After running the splunk pooling enable command, restart splunkd. Do this for
each search head in the pool.
Another reason for using a load balancer is to ensure access to search artifacts
and results if one of the search heads goes down. Ordinarily, RSS and email
alerts provide links to the search head where the search originated. If that search
head goes down (and there's no load balancer), the artifacts and results become
inaccessible. However, if you've got a load balancer in front, you can set the
alerts so that they reference the load balancer instead of a particular search
head.
There are a couple issues to note when selecting and configuring the load
balancer:
185
Generate alert links to the load balancer
To generate alert links to the load balancer, you must edit alert_actions.conf:
The alert links should now point to the load balancer, not the individual search
heads.
You must stop splunkd before running splunk pooling enable or splunk
pooling disable. However, you can run splunk pooling validate and splunk
pooling display while splunkd is either stopped or running.
The splunk pooling enable command validates search head access when you
initially set up search head pooling. If you ever need to revalidate the search
head's access to shared resources (for example, if you change the NFS
configuration), you can run the splunk pooling validate CLI command:
186
Disable search head pooling
You can disable search head pooling with this CLI command:
Run this command for each search head that you need to disable.
Important: Before running the splunk pooling disable command, you must
stop splunkd. After running the command, you should restart splunkd.
You can use the splunk pooling display CLI command to determine whether
pooling is enabled on a search head:
This example shows how the system response varies depending on whether
pooling is enabled:
Specifically, if you add a stanza to any configuration file in a local directory, you
must run the following command:
Note: This is not necessary if you make changes by means of Splunk Web or the
CLI.
187
Deployment server and search head pooling
With search head pooling, all search heads access a single set of configurations,
so you don't need to use a deployment server or a third party deployment
management tool like Puppet to push updates to multiple search heads.
However, you might still want to use a deployment tool with search head pooling,
in order to consolidate configuration operations across all Splunk Enterprise
instances.
If you want to use the deployment server to manage your search head
configuration, note the following:
The default settings have been changed to less frequent intervals starting with
5.0.3. In server.conf, the following settings affect configuration refresh timing:
# 5.0.3 defaults
[pooling]
poll.interval.rebuild = 1m
188
poll.interval.check = 1m
The previous defaults for these settings were 2s and 5s, respectively.
With the old default values, a change made on one search head would become
available on another search head at most seven seconds later. There is usually
no need for updates to be propagated that quickly. By changing the settings to
values of one minute, the load on the shared storage system is greatly reduced.
Depending on your business needs, you might be able to set these values to
even longer intervals.
For the upgrade procedure, see "Upgrade your distributed Splunk Enterprise
deployment" in the Installation Manual. Read this procedure carefully before
attempting to upgrade your search head pool. You must follow the steps
precisely to ensure that the pool remains fully functional.
189
Mount the knowledge bundle
The set of data that a search head distributes to its search peers is called the
knowledge bundle. The bundle contents reside in the search head's
$SPLUNK_HOME/etc/{apps,users,system} subdirectories. For information on the
contents and purpose of this bundle, see "What search heads send to search
peers".
By default, the search head replicates and distributes the knowledge bundle to
each search peer. You can instead tell the search peers to mount the knowledge
bundle's directory location, eliminating the need for bundle replication. When you
mount a knowledge bundle on shared storage, it's referred to as a mounted
bundle.
Caution: Most shared storage solutions don't work well across a WAN. Since
mounted bundles require shared storage, you generally should not implement
them across a WAN.
Depending on your search head configuration, there are a number of ways to set
up mounted bundles. These are some of the typical ones:
190
• For multiple non-clustered search heads. Maintain the knowledge
bundle(s) on each search head's local storage. In this diagram, each
search head maintains its own bundle, which each search peer mounts
and accesses individually:
191
In each case, the search peers need access to each search head's
$SPLUNK_HOME/etc/{apps,users,system} subdirectories.
The search peers use the mounted directories only when fulfilling the search
head's search requests. For indexing and other purposes not directly related to
distributed search, the search peers will use their own, local apps, users, and
system directories, the same as any other indexer.
Important: The search head's Splunk user account needs read/write access to
the shared storage location. The search peers must have only read access to the
bundle subdirectories, to avoid file-lock issues. Search peers do not need to
update any files in the shared storage location.
shareBundles=false
This stops the search head from replicating bundles to the search peers.
192
Configure the search peers
For each search peer, follow these steps to access the mounted bundle:
[searchhead:<searchhead-splunk-server-name>]
mounted_bundles=true
bundles_location=<path_to_bundles>
• Important: If the search peer is running against a search head cluster, the
[searchhead:] stanza on the peer must specify the cluster's GUID, not the
server name of any cluster members. For example:
[searchhead:C7729EE6-D260-4268-A699-C1F95AAD07D5]
The cluster GUID is the value of the id field, located in the captain section
of the results.
193
search head is /opt/splunk, and you export /opt/splunk/etc via NFS.
Then, on the search peer, you mount that NFS share at
/mnt/splunk-head. The value of <path_to_bundles> should be
/mnt/splunk-head, not /opt/splunk.
Note: You can optionally set up symbolic links to the bundle subdirectories
(apps,users,system) to ensure that the search peer has access only to the
necessary subdirectories in the search head's /etc directory. See the following
example for details on how to do this.
Example configuration
Search head
[distributedSearch]
...
shareBundles = false
Search peers
1. Mount the search head's $SPLUNK_HOME/etc directory on the search peer to:
/mnt/searcher01
194
2. (Optional.) Create a directory that consists of symbolic links to the bundle
subdirectories:
/opt/shared_bundles/searcher01
/opt/shared_bundles/searcher01/system -> /mnt/searcher01/system
/opt/shared_bundles/searcher01/users -> /mnt/searcher01/users
/opt/shared_bundles/searcher01/apps -> /mnt/searcher01/apps
Note: This optional step is useful for ensuring that the peer has access only to
the necessary subdirectories.
[searchhead:searcher01]
mounted_bundles = true
bundles_location = /opt/shared_bundles/searcher01
As an alternative, you can deploy search head clustering. See "About search
head clustering". For information on mounted bundles and search head
clustering, see "Search head clustering and mounted bundles".
For a list of all deprecated features, see the topic "Deprecated features" in the
Release Notes.
195
• Use the same shared storage location for both the search head pool and
the mounted bundles. Search head pooling uses a subset of the
directories required for mounted bundles.
• Search head pooling itself only requires that you mount the
$SPLUNK_HOME/etc/{apps,users} directories. However, when using
mounted bundles, you must also provide a mounted
$SPLUNK_HOME/etc/system directory. This doesn't create any conflict
among the search heads, as they will always use their own versions of the
system directory and ignore the mounted version.
• The search peers must create separate stanzas in distsearch.conf for
each search head in the pool. The bundles_location in each of those
stanzas must be identical.
See "Configure search head pooling" for information on setting up a search head
pool.
This example shows how to combine search head pooling and mounted bundles
in one system. There are two main sections to the example:
1. Set up a search head pool consisting of two search heads. In this part, you
also mount the bundles.
2. Set up the search peers so that they can access bundles from the search head
pool.
The example assumes you're using an NFS mount for the shared storage
location.
For detailed information on these steps, see "Create a pool of search heads".
196
Now, configure the search head pool:
2. On each search head, enable search head pooling. In this example, you're
using an NFS mount of /mnt/search-head-pooling as your shared storage
location:
Among other things, this step creates empty /etc/apps and /etc/users
directories under /mnt/search-head-pooling. Step 3 uses those directories.
cp -r $SPLUNK_HOME/etc/apps/* /mnt/search-head-pooling/etc/apps
cp -r $SPLUNK_HOME/etc/users/* /mnt/search-head-pooling/etc/users
cp -r $SPLUNK_HOME/etc/system /mnt/search-head-pooling/etc/
[distributedSearch]
...
shareBundles = false
197
splunk start splunkd
1. Mount the shared storage location (the same location that was earlier set to
/mnt/search-head-pooling on the search heads) so that it appears as
/mnt/bundles on the peer.
[searchhead:searcher01]
mounted_bundles = true
bundles_location = /opt/shared_bundles/bundles
[searchhead:searcher02]
mounted_bundles = true
bundles_location = /opt/shared_bundles/bundles
198
Distributed search in action
• When processing a distributed search, the search peer uses the settings
contained in the knowledge bundle that the search head distributes to all
the search peers when it sends them a search request. These settings are
created and managed on the search head.
• When performing local activities, the search peer uses the authorization
settings created and stored locally on the search peer itself.
All authorization settings are stored in one or more authorize.conf files. This
includes settings configured through Splunk Web or the CLI. It is these
authorize.conf files that get distributed from the search head to the search
peers. On the knowledge bundle, the files are usually located in either
/etc/system/{local,default} and/or /etc/apps/<app-name>/{local,default}.
Since search peers automatically use the settings in the knowledge bundle,
things normally work fine. You configure roles for your users on the search head,
and the search head automatically distributes those configurations to the search
peers when it distributes the search itself.
199
With search head pooling, however, you must take care to ensure that the search
heads and the search peers all use the same set of authorize.conf file(s). For
this to happen, you must make sure:
• All search heads in the pool use the same set of authorize.conf files
• The set of authorize.conf files that the search heads use goes into the
knowledge bundle so that they get distributed to the search peers.
This topic describes the four main scenarios, based on whether or not you're
using search head pooling or mounted bundles. It describes the scenarios in
order from simple to complex.
Four scenarios
What you need to do with the distributed search authorize.conf files depends on
whether your deployment implements search head pooling or mounted bundles.
The four scenarios are:
The first two scenarios "just work" but the last two scenarios require careful
planning. For the sake of completeness, this section describes all four scenarios.
Note: These scenarios address authorization settings for distributed search only.
Local authorization settings function the same independent of your distributed
search deployment.
Whatever authorization settings you have on the search head get automatically
distributed to its search peers as part of the replicated knowledge bundle that
they receive with distributed search requests.
Whatever authorization settings you have on the search head get automatically
placed in the mounted bundle and used by the search peers during distributed
search processing.
200
Search head pooling, no mounted bundles
The search heads in the pool share their /apps and /users directories but not
their /etc/system/local directories. Any authorize.conf file in an /apps
subdirectory will be automatically shared by all search heads and included in the
knowledge bundle when any of the search heads distributes a search request to
the search peers.
The problem arises because authorization changes can also get saved to an
authorize.conf file in a search head's /etc/system/local directory (for example,
if you update the search head's authorization settings via Splunk Web). This
directory does not get shared among the search heads in the pool, but it still gets
distributed to the search peers as part of the knowledge bundle. Because of how
the configuration system works, any copy of authorize.conf file in
/etc/system/local will have precedence over a copy in an /apps subdirectory.
(See "Configuration file precedence" in the Admin manual for details.)
To avoid this problem, you need to make sure that any changes made to a
search head's /etc/system/local/authorize.conf file get propagated to all
search heads in the pool. One way to handle this is to move any changed
/etc/system/local/authorize.conf file into an app subdirectory, since all search
heads in the pool share the /apps directory.
This is similar to the previous scenario. The search heads in the pool share their
/apps and /users directories but not their /etc/system/local directories. Any
authorize.conf file in an /apps subdirectory will be automatically shared by all
search heads. It will also be included in the mounted bundle that the search
peers use when processing a search request from any of the search heads.
201
automatically distributed to the mounted bundle that the search peers use.
Therefore, you must provide some mechanism that ensures that all the search
heads and all the search peers have access to that version of authorize.conf.
Users can limit the search peers that participate in a search. They also need to
be aware of the distributed search configuration to troubleshoot.
In general, you specify a distributed search through the same set of commands
as for a local search. However, several additional commands and options are
available specifically to assist with controlling and limiting a distributed search.
A search head by default runs its searches across its full set of search peers.
You can limit a search to one or more search peers by specifying the
splunk_server field in your query. See Retrieve events from indexes in the
Search Manual.
In addition, the lookup command provides a local argument for use with
distributed searches. If set to true, the lookup occurs only on the search head; if
false, the lookup occurs on the search peers as well. This is particularly useful
for scripted lookups, which replicate lookup tables. See the description of lookup
in the Search Reference for details and an example.
202
Troubleshoot distributed search
The monitoring console provides other dashboards that show search activity for
single instances.
View the dashboards themselves for more information. In addition, see Search
activity: Deployment in Monitoring Splunk Enterprise.
You must keep the clocks on your search heads and search peers in sync, via
NTP (network time protocol) or some similar means. The nodes require close
clock alignment, so that time comparisons are valid across systems. If the clocks
are out-of-sync by more than a few seconds, distributed search cannot work
correctly, resulting in search failures or premature expiration of search artifacts.
When you add a search peer to a search head, the search head checks that the
clocks are in sync. This check ensures that the system time, independent of the
timezone, agrees across the nodes of a distributed search environment. If the
203
nodes are out of sync, the search head rejects the search peer and displays a
banner message like this:
The time difference between this system and the intended peer at
uri=https://servername:8089/ was too big. Please bring the system
clocks into agreement.
Note: The search head does not run this check if you add the search peer by
direct edit of distseach.conf.
Configuration changes can take a short time to propagate from search heads to
search peers. As a result, during the time between when configuration changes
are made on the search head and when they're replicated to the search peers
(typically, not more than a few minutes), distributed searches can either fail or
provide results based on the previous configuration.
Types of configuration changes that can cause search failures are those that
involve new apps or changes to authentication.conf or authorize.conf.
Examples include:
• changing the allowed indexes for a role and then running a search as a
user within that role
• creating a new app and then running a search from within that app
Types of changes that can provide results based on the previous configuration
include changing a field extraction or a lookup table file.
A 6.x search head by default asks its search peers to generate a remote timeline.
This can result in slow searches if the connection between the search head and
the search peers is unstable.
204
[search]
remote_timeline_fetchall = false
After making this change, you must restart the search head.
If such a situation arises and you want to trade data fidelity for search
performance, you can direct the search head to end long-running searches
without waiting for a slow peer to finish sending all its data. To do this, you
enable the search head's [slow_peer_disconnect] stanza in limits.conf. By
default, this capability is disabled. You can toggle the capability without restarting
the search head.
The heuristics that determine when to disconnect a search from a slow peer are
complex and tunable by means of several parameters in the
[slow_peer_disconnect] stanza. If you feel the need to use this capability,
contact Splunk Professional Services for guidance in adjusting the heuristics for
your specific deployment needs.
By quarantining, instead of stopping, a bad search peer, you can perform live
troubleshooting on the peer.
You can override a quarantine for a specific search, if necessary. See How to
override a quarantine.
205
What happens when you quarantine a search peer
When you quarantine a search peer, you prevent it from taking part in new
searches. It continues to attempt to service any currently running searches.
The quarantine operation affects only the relationship between the search peer
and its search head. The search peer continues to receive and index incoming
data in its role as an indexer. If the peer is a member of an indexer cluster, it also
continues to replicate data from other peer nodes.
If you need to fully halt the activities of the indexer, you must bring it down.
To quarantine a search peer, run this CLI command from the search head:
• Use the -auth flag to provide credentials for the search head only.
• <host> is the host name or IP address of the search peer's host machine.
• <port> is the management port of the search peer.
For example:
You can also quarantine a search peer through the Search peers page on the
search head's Splunk Web. See View search peer status in Settings.
To remove a search peer from quarantine, run this command from the search
head:
206
unquarantine
Note the following:
• Use the -auth flag to provide credentials for the search head only.
• <host> is the host name or IP address of the search peer's host machine.
• <port> is the management port of the search peer.
For example:
207
Clock skew between search heads and shared storage can
affect search behavior
It's important to keep the clocks on your search heads and shared storage server
in sync, via NTP (network time protocol) or some similar means. If the clocks are
out-of-sync by more than a few seconds, you can end up with search failures or
premature expiration of search artifacts.
On each search head, the user account Splunk runs as must have read/write
permissions to the files on the shared storage server.
Performance analysis
• Storage: The storage backing the pool must be able to handle a very high
number of IOPS. IOPS under 1000 will probably never work well.
• Network: The communication path between the backing store and the
search heads must be high bandwidth and extremely low latency. This
probably means your storage system should be on the same switch as
your search heads. WAN links are not going to work.
• Server Parallelism: Because searching results in a large number of
processes requesting a large number of files, the parallelism in the system
must be high. This can require tuning the NFS server to handle a larger
number of requests in parallel.
• Client Parallelism: The client operating system must be able to handle a
significant number of requests at the same time.
• Use a storage benchmarking tool, such as Bonnie++, while the file store is
not in use to validate that the IOPS provided are robust.
• Use network testing methods to determine that the roundtrip time between
search heads and the storage system is on the order of 10ms.
208
• Perform known simple tasks such as creating a million files and then
deleting them.
• Assuming the above tests have not shown any weaknesses, perform
some IO load generation or run the actual Splunk Enterprise load while
gathering NFS stat data to see what's happening with the NFS requests.
If searches are timing out or running slowly, you might be exhausting the
maximum number of concurrent requests supported by the NFS client. To solve
this problem, increase your client concurrency limit. For example, on a Linux NFS
client, adjust the tcp_slot_table_entries setting.
Splunk Enterprise synchronizes the search head pool storage configuration state
with the in-memory state when it detects changes. Essentially, it reads the
configuration into memory when it detects updates. When dealing either with
overloaded search pool storage or with large numbers of users, apps, and
configuration files, this synchronization process can reduce performance. To
mitigate this, the minimum frequency of reading can be increased, as discussed
in "Select timing for configuration refresh".
Each search head in the pool must have a unique serverName attribute. Splunk
Enterprise validates this condition when each search head starts. If it finds a
problem, it generates this error message:
209
There was an error validating your search head pooling configuration.
For more
information, run 'splunk pooling validate'
The most common cause of this error is that another search head in the pool is
already using the current search head's serverName. To fix the problem, change
the current search head's serverName attribute in .system/local/server.conf.
There are a few other conditions that also can generate this error:
This updates the pooling.ini file with the current search head's
serverName->GUID mapping, overwriting any previous mapping.
When upgrading pooled search heads, you must copy all updated apps - even
those that ship with Splunk Enterprise (such as the Search app) - to the search
head pool's shared storage after the upgrade is complete. If you do not, you
might see artifacts or other incorrectly-displayed items in Splunk Web.
To fix the problem, copy all updated apps from an upgraded search head to the
shared storage for the search head pool, taking care to exclude the local
sub-directory of each app.
Important: Excluding the local sub-directory of each app from the copy process
prevents the overwriting of configuration files on the shared storage with local
copies of configuration files.
Once the apps have been copied, restart Splunk Enterprise on all search heads
in the pool.
210
Distributed search error messages
This table lists some of the more common search-time error messages
associated with distributed search:
211