12.2.1.5 Lab - Convert Data Into A Universal Format
12.2.1.5 Lab - Convert Data Into A Universal Format
Objectives
Part 1: Normalize Timestamps in a Log File
Part 2: Normalize Timestamps in an Apache Log File
Part 3: Log File Preparation in Security Onion
Background / Scenario
This lab will prepare students to learn where log files are located and how to manipulate and view log files.
Log entries are generated by network devices, operating systems, applications, and various types of
programmable devices. A file containing a time-sequenced stream of log entries is called a log file.
By nature, log files record events that are relevant to the source. The syntax and format of data within log
messages are often defined by the application developer.
Therefore, the terminology used in the log entries often varies from source to source. For example, depending
on the source, the terms login, logon, authentication event, and user connection, may all appear in log entries
to describe a successful user authentication to a server.
It is often desirable to have a consistent and uniform terminology in logs generated by different sources. This
is especially true when all log files are being collected by a centralized point.
The term normalization refers to the process of converting parts of a message, in this case a log entry, to a
common format.
In this lab, you will use command line tools to manually normalize log entries. In Part 2, the timestamp field
will be normalized. In Part 3, the IPv6 field will be normalized.
Note: While numerous plugins exist to perform log normalization, it is important to understand the basics
behind the normalization process.
Required Resources
CyberOps Workstation VM
Security Onion VM
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 1 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
From a programmability stand point, it is much easier to work with Epoch as it allows for easier addition and
subtraction operations. From an analysis perspective; however, Human Readable timestamps are much
easier to interpret.
Converting Epoch to Human Readable Timestamps with AWK
AWK is a programming language designed to manipulate text files. It is very powerful and especially useful
when handling text files where the lines contain multiple fields, separated by a delimiter character. Log files
contain one entry per line and are formatted as delimiter-separated fields, making AWK a great tool for
normalizing.
Consider the applicationX_in_epoch.log file below. The source of the log file is not relevant.
2|Z|1219071600|AF|0
3|N|1219158000|AF|89
4|N|1220799600|AS|12
1|Z|1220886000|AS|67
5|N|1220972400|EU|23
6|R|1221058800|OC|89
The log file above was generated by application X. The relevant aspects of the file are:
o The columns are separated, or delimited, by the | character. Therefore, the file has five columns.
o The third column contains timestamps in Unix Epoch.
o The file has an extra line at the end. This will be important later in the lab.
Assume that a log analyst needed to convert the timestamps to the Human Readable format. Follow the steps
below to use AWK to easily perform the manual conversion:
a. Launch the CyberOps Workstation VM and then launch a terminal window.
b. Use the cd command to change to the /home/analyst/lab.support.files/ directory. A copy of the file
shown above is stored there.
[analyst@secOps ~]$ cd ./lab.support.files/
[analyst@secOps lab.support.files]$ ls -l
total 580
-rw-r--r-- 1 analyst analyst 649 Jun 28 18:34 apache_in_epoch.log
-rw-r--r-- 1 analyst analyst 126 Jun 28 11:13 applicationX_in_epoch.log
drwxr-xr-x 4 analyst analyst 4096 Aug 7 15:29 attack_scripts
-rw-r--r-- 1 analyst analyst 102 Jul 20 09:37 confidential.txt
<output omitted>
[analyst@secOps lab.support.files]$
c. Issue the following AWK command to convert and print the result on the terminal:
Note: It is easy to make a typing error in the following script. Consider copying the script out to a text
editor to remove the extra line breaks. Then copy the script from the text editor into the CyberOps
Workstation VM terminal window. However, be sure to study the script explanation below to learn how
this script modifies the timestamp field.
[analyst@secOps lab.support.files]$ awk 'BEGIN
{FS=OFS="|"}{$3=strftime("%c",$3)} {print}' applicationX_in_epoch.log
2|Z|Mon 18 Aug 2008 11:00:00 AM EDT|AF|0
3|N|Tue 19 Aug 2008 11:00:00 AM EDT|AF|89
4|N|Sun 07 Sep 2008 11:00:00 AM EDT|AS|12
1|Z|Mon 08 Sep 2008 11:00:00 AM EDT|AS|67
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 2 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 3 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
e. While printing the result on the screen is useful for troubleshooting the script, analysts will likely need to
save the output in a text file. Redirect the output of the script above to a file named
applicationX_in_human.log to save it to a file:
[analyst@secOps lab.support.files]$ awk 'BEGIN
{FS=OFS="|"}{$3=strftime("%c",$3)} {print}' applicationX_in_epoch.log >
applicationX_in_human.log
[analyst@secOps lab.support.files]$
What was printed by the command above? Is this expected?
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
f. Use cat to view the applicationX_in_human.log. Notice that the extra line is now removed and the
timestamps for the log entries have been converted to human readable format.
[analyst@secOps lab.support.files]$ cat applicationX_in_human.log
2|Z|Mon 18 Aug 2008 11:00:00 AM EDT|AF|0
3|N|Tue 19 Aug 2008 11:00:00 AM EDT|AF|89
4|N|Sun 07 Sep 2008 11:00:00 AM EDT|AS|12
1|Z|Mon 08 Sep 2008 11:00:00 AM EDT|AS|67
5|N|Tue 09 Sep 2008 11:00:00 AM EDT|EU|23
6|R|Wed 10 Sep 2008 11:00:00 AM EDT|OC|89
[analyst@secOps lab.support.files]$
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 4 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
o The fifth column contains text with details about the event, including URLs and web request
parameters. All six entries are HTTP GET messages. Because these messages include spaces, the
entire field is enclosed with quotes.
o The sixth column contains the HTTP status code, for example 401.
o The seventh column contains the size of the response to the client (in bytes), for example 12846.
Similar to part one, a script will be created to convert the timestamp from Epoch to Human Readable.
a. First, answer the questions below. They are crucial for the construction of the script.
In the context of timestamp conversion, what character would work as a good delimiter character for the
Apache log file above?
____________________________________________________________________________________
How many columns does the Apache log file above contain?
____________________________________________________________________________________
In the Apache log file above, what column contains the Unix Epoch Timestamp?
____________________________________________________________________________________
b. In the CyberOps Workstation VM terminal, a copy of the Apache log file, apache_in_epoch.log, is stored
in the /home/analyst/lab.support.files.
c. Use an awk script to convert the timestamp field to a human readable format. Notice that the command
contains the same script used previously, but with a few adjustments for the timestamp field and file
name.
[analyst@secOps lab.support.files]$ awk 'BEGIN {FS=OFS="
"}{$4=strftime("%c",$4)} {print}'
/home/analyst/lab.support.files/apache_in_epoch.log
Was the script able to properly convert the timestamps? Describe the output.
____________________________________________________________________________________
d. Before moving forward, think about the output of the script. Can you guess what caused the incorrect
output? Is the script incorrect? What are the relevant differences between the
applicationX_in_epoch.log and apache_in_epoch.log?
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
e. To fix the problem, the square brackets must be removed from the timestamp field before the conversion
takes place. Adjust the script by adding two actions before the conversion, as shown below:
[analyst@secOps lab.support.files]$ awk 'BEGIN {FS=OFS=" "}
{gsub(/\[|\]/,"",$4)}{print}{$4=strftime("%c",$4)}{print}'
apache_in_epoch.log
Notice after specifying space as the delimiter with {FS=OFS=” “}, there is a regular expression action to
match and replace the square brackets with an empty string, effectively removing the square brackets
that appear in the timestamp field. The second action prints the updated line so the conversion action can
be performed.
gsub() – This is an internal AWK function used to locate and substitute strings. In the script
above, gsub() received three comma-separated parameters, described below.
/\[|\]/ – This is a regular expression passed to gsub() as the first parameter. The regular
expression should be read as ‘find “[“ OR “]”’. Below is the breakdown of the expression:
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 5 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
o The first and last “/” character marks the beginning and end of the search block. Anything
between the first “/” and the second “/” are related to the search. The “\” character is used
to escape the following “[“. Escaping is necessary because “[“ can also be used by an
operator in regular expressions. By escaping the “[“ with a leading “\”, we tell the
interpreter that the “]” is part of the content and not an operator. The “|” character is the
OR operator. Notice that the “|” is not escaped and will therefore, be seen as an operator.
Lastly, the regular expression escapes the closing square bracket with “\]”, as done
before.
"" – This represents no characters, or an empty string. This parameter tells gsub() what to
replace the “[“ and “]” with, when found. By replacing the “[“ and “]” with “”, gsub() effectively
removes the “[“ and “]” characters.
$4 – This tells gsub() to work only on the fourth column of the current line, the timestamp column.
Note: Regular expression interpretation is a SECOPS exam topic. Regular expressions are covered in
more detail in another lab in this chapter. However, you may wish to search the Internet for tutorials.
f. In a CyberOps Workstation VM terminal, execute the adjusted script, as follows:
[analyst@secOps lab.support.files]$ awk 'BEGIN {FS=OFS="
"}{gsub(/\[|\]/,"",$4)}{print}{$4=strftime("%c",$4)}{print}'
apache_in_epoch.log
Was the script able to properly convert the timestamps this time? Describe the output.
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 6 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
SSL/TLS handshakes
Snort and SGUIL
Snort is an IDS that relies on pre-defined rules to flag potentially harmful traffic. Snort looks into all portions of
network packets (headers and payload), looking for patterns defined in its rules. When found, Snort takes the
action defined in the same rule.
SGUIL provides a graphical interface for Snort logs and alerts, allowing a security analyst to pivot from SGUIL
into other tools for more information. For example, if a potentially malicious packet is sent to the organization
web server and Snort raised an alert about it, SGUIL will list that alert. The analyst can then right-click that
alert to search the ELSA or Bro databases for a better understanding of the event.
Note: The directory listing maybe different than the sample output shown below.
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 7 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
b. You can also right-click the Desktop > Open Terminal Here, as show in the following screenshot:
c. ELSA logs can be found under the /nsm/elsa/data/elsa/log/ directory. Change the directory using the
following command:
analyst@SecOnion:~/Desktop$ cd /nsm/elsa/data/elsa/log
analyst@SecOnion:/nsm/elsa/data/elsa/log$
d. Use the ls –l command to list the files:
analyst@SecOnion:/nsm/elsa/data/elsa/log$ ls -l
total 99112
total 169528
-rw-rw---- 1 www-data sphinxsearch 56629174 Aug 18 14:15 node.log
-rw-rw---- 1 www-data sphinxsearch 6547557 Aug 3 07:34 node.log.1.gz
-rw-rw---- 1 www-data sphinxsearch 7014600 Jul 17 07:34 node.log.2.gz
-rw-rw---- 1 www-data sphinxsearch 6102122 Jul 13 07:34 node.log.3.gz
-rw-rw---- 1 www-data sphinxsearch 4655874 Jul 8 07:35 node.log.4.gz
-rw-rw---- 1 www-data sphinxsearch 6523029 Aug 18 14:15 query.log
-rw-rw---- 1 www-data sphinxsearch 53479942 Aug 18 14:15 searchd.log
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 8 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 9 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 10 of 11 www.netacad.com
Lab – Convert Data into a Universal Format
For each one of the tools listed above, describe the function, importance, and placement in the security
analyst workflow.
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
____________________________________________________________________________________
Part 4: Reflection
Log normalization is important and depends on the deployed environment.
Popular tools include their own normalization features, but log normalization can also be done manually.
When manually normalizing and preparing log files, double-check scripts to ensure the desired result is
achieved. A poorly written normalization script may modify the data, directly impacting the analyst’s work.
Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 11 of 11 www.netacad.com