all about splunk
all about splunk
MERROUNI HAMID
Contents
I. Lab Preparation and how does spluk work: ................................................................................2
Install a Windows universal forwarder ...................................................................................3
Install a *nix universal forwarder ...........................................................................................3
How Data Travels Through Splunk: .........................................................................................3
What’s an index : ....................................................................................................................4
splunk config file .....................................................................................................................4
Enable a Receiver for Splunk Enterprise: .................................................................................5
Shipping Windows Event Logs to Splunk .................................................................................6
Creating a Test Index ..............................................................................................................6
II. Data Onboarding: .......................................................................................................................6
III. Basic SPL Splunk Processing Language : ..................................................................................7
IV. File Lookups: ......................................................................................................................... 18
V. Visualizations and Dashboards: ................................................................................................ 24
1. Visualizations: ...................................................................................................................... 24
2. Classic Dashboard ................................................................................................................. 27
3. building a dynamic Dashboard by token .............................................................................. 30
VI. Data models: ........................................................................................................................ 32
VII. Using Search to Find Suspicious Events: ............................................................................... 36
1. HTTP Connections to a Server by IP (Conditional Eval) .......................................................... 36
2. Searching 2 Data Sources for a Common Indicator (Using OR) ............................................. 37
3. Finding Traces of Look-a-like Domains (Using fuzzylookup).................................................. 38
4. Using Geolocation to Find Anomalous Connections .............................................................. 40
5. first Time Login of a User on a Machine ................................................................................ 41
6. Identifying Password Guessing Attempts with Failed and Successful Logins ........................ 42
7. Identifying High Network Bandwidth Consumption from a Baseline .................................... 44
1
I. Lab Preparation and how does spluk work:
In this lab you will learn the installation and configuration of splunk in windows and linux
We download the version of splunk of system we will use ( I use my windows host to make splunk
work butter , but you can use any system ( windows , linux or macOS) in VM or host , there are no
different
2
Install a Windows universal forwarder
https://docs.splunk.com/Documentation/Forwarder/9.2.0/Forwarder/InstallaWindowsunivers
alforwarderfromaninstaller#:~:text=Download%20the%20universal%20forwarder%20from,Spl
unk%20Enterprise%20or%20Splunk%20Cloud.
https://docs.splunk.com/Documentation/Forwarder/9.2.0/Forwarder/Installanixuniversalforw
arder
3
What’s an index :
an index is a repository or storage location where Splunk stores and manages data. When data is
ingested into Splunk, it is indexed and stored in these index locations. Indexes serve as the
foundation for searching, analyzing, and visualizing data within Splunk
the config file in linux and windows are the same way
4
Enable a Receiver for Splunk Enterprise:
5
Shipping Windows Event Logs to Splunk
https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/MonitorWindowseventlogdata#:~:text=
To%20monitor%20Windows%20Event%20Log,data%20into%20Splunk%20Cloud%20Platform.
https://docs.splunk.com/Documentation/Splunk/9.2.0/Data/Useatestindex
https://docs.splunk.com/Documentation/Splunk/latest/DistSearch/Whatisdistributedsearch
https://docs.splunk.com/Documentation/Splunk/latest/Deploy/Datapipeline
https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Aboutindexesandindexers
https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Aboutmanagingindexes
https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Enableareceiver
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Aboutconfigurationfiles
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Wheretofindtheconfigurationfiles
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Listofconfigurationfiles
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Whatsanapp
https://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorWindowseventlogdata
6
III. Basic SPL Splunk Processing Language :
Botv2: A sample security dataset and CTF platform for information security professionals,
researchers, students, and enthusiasts. This page hosts information regarding the version 2 dataset
To download : https://github.com/splunk/botsv2
7
index=botsv2 earliest=0 sourcetype="stream:http" |stats count by host
this command use for count the satastics by host from the index botsv2 source stream:http
8
index=botsv2 earliest=0 sourcetype="stream:http" |table src_ip, dest_ip, dest_port
Command table use to create table of the fields those we are choosing
Function dc calculates the number of distinct source ports (src_port) for each combination of src_ip,
dest_ip, and dest_port
9
index=botsv2 earliest=0 sourcetype="stream:http"
|eval mb =bytes/(1024*1024)
|timechart span=1d sum(mb) as total_mb
eval mb = bytes/(1024*1024): Calculates the total megabytes transferred by dividing the bytes field
by 1024*1024.
timechart span=1d sum(mb) as total_mb: Aggregates the data over each day (span=1d) and sums up
the total megabytes transferred (sum(mb)), creating a timechart.
We can use with span function s,m,h,d,w,mon and y for specific the duration of time
where bytes_out > bytes_in: Filters events where the number of bytes sent out (bytes_out) is
greater than the number of bytes received (bytes_in).
10
index=botsv2 earliest=0 sourcetype="stream:http"
| rex field=http_user_agent "Windows\s+NT\s+(?<win_nt_version>[\d\.]+)"
This query should help you extract and identify the version of Windows NT from the user-agent
strings in your HTTP stream events.
11
index=botsv2 earliest=0 sourcetype="stream:http"
| stats sum(bytes_in) AS bytes_in, sum(bytes_out) AS bytes_out BY src_ip, dest_ip
| eval mbytes_in = round(bytes_in/pow(1024,2),2), mbytes_out =
round(bytes_out/pow(1024,2),2) | fields - bytes_in, bytes_out
| stats avg(mbytes_in) AS avg_mbytes_in, avg(mbytes_out) AS avg_mbytes_out
| fields - bytes_in, bytes_out: Removes the original bytes_in and bytes_out fields, leaving only the
mbytes_in and mbytes_out fields.
| sort - mbytes_in: Sorts the results based on the average megabytes transferred in descending
order.
This query provides insights into the average megabytes transferred between source and destination
IP pairs, sorted by the average megabytes in descending order.
12
index=botsv2 earliest=0 sourcetype="stream:http" | stats list(http_method) AS http_methods
stats list(http_method) AS http_methods: Calculates a list of unique values of the http_method field
across all events and assigns it to a new field named http_methods.
This query will give you a list of unique HTTP methods present in the dataset.
13
index=botsv2 earliest=0 sourcetype="stream:http"
| eval ua_length = len(http_user_agent)
| stats min(ua_length) AS min, max(ua_length) AS max, avg(ua_length) AS avg
eval ua_length = len(http_user_agent): Calculates the length of the http_user_agent field for
each event and assigns it to a new field named ua_length.
This query will provide you with statistics regarding the length of the http_user_agent field, including
the minimum, maximum, and average lengths.
14
index=botsv2 earliest=0 sourcetype="stream:http" | streamstats count | table count, _raw
| streamstats count: Uses streamstats to calculate the count of events encountered so far.
This query will give you a table showing the count of events encountered so far along with the
corresponding raw event data.
| chart count BY http_method host: This part of the query utilizes the chart command to create a
chart where the x-axis represents the unique values of the http_method field, the y-axis represents
the count of events for each method, and each series in the chart represents a different value of the
host field.
This chart provides a visual representation of the distribution of HTTP methods across different hosts
in your dataset. It allows you to quickly analyze which HTTP methods are more commonly used on
which hosts.
15
index=botsv2 earliest=0 sourcetype="stream:http"
| lookup apache_httpstatus_lookup status as http_status
| inputlookup apache_httpstatus_lookup
| inputlookup apache_httpstatus_lookup: This command reads the contents of the lookup table
named apache_httpstatus_lookup and presents the data as search results.
This command is useful for viewing the contents of a lookup table directly within your search
interface, which can be helpful for debugging or inspecting the data in the lookup table.
16
| inputlookup apache_httpstatus_lookup
| append
[ makeresults
| fields - _time
| outputlookup apache_httpstatus_lookup
In summary, this query adds a new entry to the apache_httpstatus_lookup lookup table with the
status code 418, a custom status description, and a status type. It then updates the lookup table with
this new entry.
17
IV. File Lookups:
1. CSV file Lookups
file located in the search head
lives either in the app’s lookups folder or /etc/system/lookups
In cmd : ~/splunk/etc/apps
mkdir well_known_ports
mkdir metadata
notpade default.meta
mkdir lookups
cd lookups
notepad well_known_ports.csv
mkdir default
cd default
notepad transforms.conf
18
http://localhost:8000/en-US/debug/refresh
| lookup well_known_ports port AS dest_port, transport: Performs a lookup operation using the
well_known_ports lookup table. It matches the dest_port field from the events with the port field in
the well_known_ports table and retrieves the corresponding transport field.
This query helps to enrich the traffic events with additional information about the destination ports,
including the transport protocol and a description of the service associated with each port
19
2. KV Store Lookups:
Splunk comes with a instance of MongoDB
Great large of lookups or when data is updated often
In cmd : ~/splunk/etc/apps/well_known_ports/default:
notepad collections.conf
notepad transforms.conf
http://localhost:8000/en-US/debug/refresh
| makeresults
| eval port="443", transport="tcp", description="https"
| append
[| makeresults
| eval port="123", transport="udp", description="ntp"]
| append
[| makeresults
| eval port="53", transport="udp", description="ssh"]
| fields - _time
| outputlookup well_known_ports_kv
20
21
3. External Lookups:
Uses a python script or binary to fetch data from an external source
Exemple: Built-in dnslookup
In cmd ~ \etc\system\default
notepad transforms.conf
22
index=botsv2 sourcetype="pan:traffic" earliest=0 site="bea4.cnn.com"
| lookup dnslookup clienthost AS site
4. Automatic Lookups:
Go to ~\etc\apps\well_known_ports\default
notepad props.conf
refresh by : http://localhost:8000/en-US/debug/refresh
you will find now the field description without use command lookup
23
V. Visualizations and Dashboards:
1. Visualizations:
24
by using single value with stats command
25
index=botsv2 sourcetype="pan:traffic" earliest=08/30/2017:00:00:00
latest=09/01/2017:00:00:00
| iplocation dest_ip
| stats count by Country
| geom geo_countries allFeatures=True featureIdField=Country
| iplocation dest_ip: This command performs IP geolocation on the destination IP addresses found in
the events.
| stats count by Country: This command counts the occurrences of events for each country.
26
2. Classic Dashboard
index=botsv2 sourcetype="pan:traffic" earliest=08/30/2017:00:00:00
latest=09/01/2017:00:00:00
| stats count BY src_ip
| sort -count
We write a new name for it and we can change its permissions if we want ,Then we can save it
27
we will get the dashboard above with table of src_ip
also we can add a more content from anther searches into this dashboard
such as statistic of destinations ip by count but before that we will use save as existing dashboard
28
we will find the second table below the first table, but will do some edits to obtain the dashboard
more former .Also ,we will add another contents , the first the timechat by dest port , and table of
statistic by source and destination interface
29
3. building a dynamic Dashboard by token
by using the dashboard we was creating before , we click on edit and add input “text”
After that, we click on the pen icon and input the information as shown in the image below
30
we will click in seacrh icon from each the four panel and add $src_ip_tok$ $dest_ip_tok$ and click on
Apply , then we will save all edits
finaly, we can write whice IP we want to show stastics of her in Dashboard (we can use destantion ip
,source ip or the both of then in the same time ) , the exemple below we show all stastic of conection
from all IPs to 8.8.8.8
31
VI. Data models:
we will find a lot of data models we can chose for exmple the network trafic from the second page
32
we can choose fields now , for exmple destination ip :
33
we can also use data models in bar of search
34
| datamodel Network_Traffic search
| stats count BY All_Traffic.dest_ip
35
VII. Using Search to Find Suspicious Events:
36
2. Searching 2 Data Sources for a Common Indicator (Using OR)
eval site = replace(site,"(.*):\d+","\1"): This uses the eval command to create a new field "site" by
removing the port number from the "site" field.
replace(site,"(.*):\d+","\1"): This is the function used to modify the "site" field. It's a regular
expression-based replacement function.
.*: This matches any character (except for line terminators) zero or more times.
Essentially, it matches any sequence of characters.
\1: This is a backreference in the replacement string. It refers to the first captured group (.*):,
which is everything before the colon. So essentially, it replaces the entire match
Overall, this query is designed to search for events related to the "uranus.frothly.local" host within
the specified time range and analyze the counts and types of those events based on the source IP
address.
37
3. Finding Traces of Look-a-like Domains (Using fuzzylookup)
First, we need to instal Fuzzylookup in splunk enterprise , then we will create file lookups her name is
our_domain (we flow the same steps of creation CSV file lookups)
38
uzzylookup addmetrics=true our_domains domains AS maybe_evil_domain: This command
performs a fuzzy lookup against a list of known "safe" domains. Here's what each part does:
search fuzzy_score < 3: This line filters the results to only include matches where the fuzzy score (a
measure of similarity between strings) is less than 3. This implies that the domains being compared
are quite different from the known-good domains.
this query is designed to identify potentially malicious domains by comparing them against a list of
known-safe domains. It calculates similarity scores between potentially malicious domains and
known-safe domains using fuzzy matching and filters the results to only show domains with low
similarity scores. The results are then presented in a table format for further analysis.
39
4. Using Geolocation to Find Anomalous Connections
geostats globallimit=0 count BY City: This command calculates geographical statistics based
on the location information extracted in the previous step. Specifically, it counts the
occurrences of events grouped by the "City" field. The globallimit=0 parameter ensures that
all results are returned, regardless of any default limits set in the Splunk configuration
1. rare Country: This command identifies rare values in the "Country" field. It counts the
occurrences of each unique country and calculates the percentage of occurrences relative to
the total. Countries with a percentage of occurrences less than 1% are considered rare.
2. search percent < 1: This filters the results to only include countries with occurrence
percentages less than 1%. These are the rare countries identified in the access logs within the
specified time range.
40
5. first Time Login of a User on a Machine
stats earliest(_time) AS earliest latest(_time) AS latest BY user, dest: Groups the events by the
"user" and "dest" fields and calculates the earliest and latest timestamps for each group
where earliest >= relative_time(deadline,"-1d@d"): Filters the results to only include events where
the earliest timestamp is greater than or equal to one day before the deadline specified (i.e., August
24, 2017).
convert ctime(earliest) AS earliest: Converts the earliest timestamp back to human-readable format
using the convert and ctime functions.
By using this query we see in the above image two users loged in different destination , so this is look
a suspicious events
41
6. Identifying Password Guessing Attempts with Failed and Successful Logins
signature_id IN (4624, 4625): Filters events to include those with either signature ID 4624 or 4625.
Signature ID 4624 typically corresponds to successful logon events, while 4625 corresponds to failed
logon events.
NOT user="*$": Excludes logon events where the user ends with a "$". These events typically
represent computer account logons, which might not be relevant for user-based analysis.
streamstats time_window=1h count(eval(signature_id=4625)) AS failed_count
count(eval(signature_id=4624)) AS success_count BY user, dest: This command calculates statistics
over a streaming window of 1 hour. It counts the occurrences of signature IDs 4625 (failed logon
attempts) and 4624 (successful logon attempts) for each user and destination combination.
this query aims to provide insights into user logon activities by counting the number of successful and
failed logon attempts over time, grouped by user and destination. However, it's worth noting that the
earliest=1 parameter seems incomplete and may need to be adjusted to specify a valid time range for
the search.
42
index=botsv2 sourcetype=wineventlog signature_id IN (4624, 4625) NOT user="*$" earliest=1
| streamstats time_window=1h count(eval(signature_id=4625)) AS failed_count
count(eval(signature_id=4624)) AS success_count BY user, dest
| where failed_count > 2 AND success_count > 0
| stats max(failed_count) AS max_failed_count BY user, dest
where failed_count > 2 AND success_count > 0: Filters the results to only include combinations
where there are more than 2 failed logon attempts (failed_count > 2) and at least 1 successful logon
attempt (success_count > 0).
stats max(failed_count) AS max_failed_count BY user, dest: Aggregates the results to show the
maximum number of failed logon attempts (max_failed_count) for each user and destination
combination
this query aims to identify instances where there have been multiple failed logon attempts followed
by at least one successful logon attempt for each user and destination combination. This information
can be valuable for detecting potential security breaches or suspicious activities related to user logon
attempts.
43
7. Identifying High Network Bandwidth Consumption from a Baseline
| bin span=1d _time: Bins the timestamps into daily intervals using a span of 1 day. This allows for
aggregation of data on a daily basis
| eventstats avg(bytes_out) AS avg, stdev(bytes_out) AS stdev: Computes the average and standard
deviation of outbound bytes across all events.
| eval upper_bound = avg + 2 * stdev: Calculates an upper bound for outbound bytes based on the
average plus two times the standard deviation. This is a common statistical approach to identify
outliers.
| where bytes_out > upper_bound: Filters the data to include only events where the outbound bytes
exceed the calculated upper bound. This helps to identify unusually large traffic volumes.
this query aims to identify unusually high outbound traffic volumes from different source IP addresses
to various destination IP addresses, potentially indicating anomalous or suspicious network activity.
44