0% found this document useful (0 votes)
2 views

The Essential Guide To Data

Uploaded by

ariel.a.hinds
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

The Essential Guide To Data

Uploaded by

ariel.a.hinds
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

The Essential

Guide to Data
An update to The Essential Guide
to Machine Data, exploring how
to bring data to every question,
decision and action
Time-Series Data.
Streaming Data.
Dark Data.
It’s no secret that data remains underused and undervalued in With the right approach, data makes it simple to:
most organizations all over the world. Despite the constant talk of
data-driven decisions, organizations of all sizes are still missing the • Make better informed decisions about every part of your business.
mark on how to effectively capture and use the troves of data being • Run your operations more efficiently.
generated every day, whether it comes from users, outside industry
• Optimize user and customer experiences.
resources, or their own networked devices. In fact, most business
and IT decision makers estimate that 55% of their data is dark data, • Detect the fingerprints of fraud — or prevent it altogether.
information you don’t know you have, or can’t fully tap. • Uncover potential disasters before they happen.

• Find hidden trends that help your company leapfrog the competition.
This is a big missed opportunity. Important insights across IT, security
and your organization lie hidden in this data. Data holds the definitive • Make everyone who uses it look like a hero.
record of all activity and behavior of your customers and users, • … and so much more.
transactions, applications, servers, networks, mobile devices and more.
Critical information on everything from configurations, APIs, message
The challenge with leveraging the vast quantity of data that most
queues, diagnostic outputs, sensor data of industrial systems and more
companies collect is that it comes in a dizzying range of formats
is all there — you just have to tap into it the right way.
that traditional data monitoring and analysis tools aren’t designed
to handle. Many tools can’t keep up with the varying data structures,
sources or time scales. And it goes well beyond just machine data as
well. But the upside to tapping into your data is tremendous, and this
is where Splunk comes in.

With Splunk, you can bring data to every question, decision and
action in your organization to create meaningful outcomes. Unlike any
other platform, Splunk is truly able to take any data from any source
and drive real action to benefit the business — from IT infrastructure
and security monitoring to DevOps and application performance
monitoring and management.
Turn Data Into Doing
in Practice
Use data to: Machine Data Contains
What Does Machine Critical Insights
Data Look Like?
Sources
Order ORDER, 05-21T14:04:12.484,10098213, 569281734,67.17.10.12,43CD1A7B8322,SA-2100
Processing
MAY 21 14:04:12.996 wl-01.acme.com Order 569281734 failed for customer 10098213.
Exception follows: weblogic.jdbc.extensions.ConnectionDeadSQLException:
Middleware weblogic.common.resourcepool.ResourceDeadException: Could not create pool connection. The
Error DBMS driver exception was: [BEA][Oracle JDBC Driver] Error establishing socket to host and port:
ACMEDB-01:1521. Reason: Connection refused

05/21 16:33:11.238 [CONNEVENT] Ext 1207130 (0192033): Event 20111, CTI Num:ServID:Type
0:19:9, App 0, ANI T7998#1, DNIS 5555685981, SerID 40489a07-7f6e-4251-801a-

Investigate Monitor Analyze Act Care IVR


13ae51a6d092, Trunk T451.16
05/21 16:33:11:242 [SCREENPOPEVENT] SerID 40489a07-7f6e-4251-801a-13ae51a6d092
CUSTID 10098213
05/21 16:37:49.732 [DISCEVENT] SerID 40489a07-7f6e-4251-801a-13ae51a6d092

{actor:{displayName: “Go team!!”,followersCount:1366,friendsCount:789,link:


http://dallascowboys.com/,location:{displayName:“Dallas, TX”,objectType:“place”},
The organizations that get the most value out of their data are those Twitter objectType:“person”,preferredUsername:“B0ysF@n80”,statusesCount:6072},body: “Can’t buy
this device from @ACME. Site doesn’t work! Called, gave up on waiting for them to answer! RT if
able to take disparate data types, enrich them and extract answers. you hate @ACME!!”,objectType:“activity”,postedTime:“05-21T16:39:40.647-0600”}

But not knowing what data to ingest can stop businesses before Figure 1: Data can come from any number of sources, and at first glance,
they start. can look like random text.

Familiarizing yourself with general use cases in security, IT operations,


business analytics, DevOps, the Internet of Things (IoT) and more — Machine Data
Machine Data Contains
Contains Critical Insights
Critical Insights
including the data types and sources involved — can get you on track Sources
Customer ID Order ID Product ID
right away. Order ORDER, 05-21T14:04:12.484,10098213, 569281734,67.17.10.12,43CD1A7B8322,SA-2100
Processing
MAY 21 14:04:12.996 wl-01.acme.com Order 569281734 failed for customer 10098213.
Order ID
Exception follows: weblogic.jdbc.extensions.ConnectionDeadSQLException:
Customer ID
Here’s an example: Middleware weblogic.common.resourcepool.ResourceDeadException: Could not create pool connection. The
Error DBMS driver exception was: [BEA][Oracle JDBC Driver] Error establishing socket to host and port:
ACMEDB-01:1521. Reason: Connection refused

1. A customer’s order didn’t go through 05/21 16:33:11.238 [CONNEVENT] Ext 1207130 (0192033): Event 20111, CTI Num:ServID:Type
0:19:9, App 0, ANI T7998#1, DNIS 5555685981, SerID 40489a07-7f6e-4251-801a-
Time waiting on Trunk
13ae51a6d092, hold T451.16
Care IVR
05/21 16:33:11:242 [SCREENPOPEVENT] SerID 40489a07-7f6e-4251-801a-13ae51a6d092
2. The customer called support to resolve the issue CUSTID 10098213 Customer ID
05/21 16:37:49.732 [DISCEVENT] SerID 40489a07-7f6e-4251-801a-13ae51a6d092

{actor:{displayName: “Go team!!”,followersCount:1366,friendsCount:789,link:


Customer’s Tweet
3. After too much time on hold, the customer gave up and Customer’s TwitterTX”,objectType:“place”},
http://dallascowboys.com/,location:{displayName:“Dallas, ID
Twitter objectType:“person”,preferredUsername:“B0ysF@n80”,statusesCount:6072},body: “Can’t buy

tweeted a complaint about the company this device from @ACME. Site doesn’t work! Called, gave up on waiting for them to answer! RT if
you hate @ACME!!”,objectType:“activity”,postedTime:“05-21T16:39:40.647-0600”}
Company’s Twitter ID

Figure 2: The value of data is hidden in this seemingly random text.


Machine Data
Machine Data
Sources
Contains
Contains Critical Insights
Critical Insights
The Essential Guide
to Data
Customer ID Order ID Product ID
Order ORDER, 05-21T14:04:12.484,10098213, 569281734,67.17.10.12,43CD1A7B8322,SA-2100
Processing
MAY 21 14:04:12.996 wl-01.acme.com Order 569281734 failed for customer 10098213.
Order ID
Exception follows: weblogic.jdbc.extensions.ConnectionDeadSQLException:
Customer ID
Middleware weblogic.common.resourcepool.ResourceDeadException: Could not create pool connection. The
Error DBMS driver exception was: [BEA][Oracle JDBC Driver] Error establishing socket to host and port:
ACMEDB-01:1521. Reason: Connection refused In this guide, we provide a high-level overview of the most common
05/21 16:33:11.238 [CONNEVENT] Ext 1207130 (0192033): Event 20111, CTI Num:ServID:Type
0:19:9, App 0, ANI T7998#1, DNIS 5555685981, SerID 40489a07-7f6e-4251-801a-
types of data in organizations of all sizes. While needs may differ from
Time waiting on Trunk
13ae51a6d092, hold T451.16
Care IVR
05/21 16:33:11:242 [SCREENPOPEVENT] SerID 40489a07-7f6e-4251-801a-13ae51a6d092 business to business based on vendors, products and infrastructure
CUSTID 10098213 Customer ID
05/21 16:37:49.732 [DISCEVENT] SerID 40489a07-7f6e-4251-801a-13ae51a6d092 or mission, we’ve outlined the details that are key as you look for
{actor:{displayName: “Go team!!”,followersCount:1366,friendsCount:789,link:
Customer’s TwitterTX”,objectType:“place”},
http://dallascowboys.com/,location:{displayName:“Dallas, ID
Customer’s Tweet value in your data.
Twitter objectType:“person”,preferredUsername:“B0ysF@n80”,statusesCount:6072},body: “Can’t buy
this device from @ACME. Site doesn’t work! Called, gave up on waiting for them to answer! RT if
you hate @ACME!!”,objectType:“activity”,postedTime:“05-21T16:39:40.647-0600”}
Company’s Twitter ID Most of the data sources and types in this guide can support multiple
Figure 3: By correlating different types of data together, you can start to use cases — a major driver of value for data — so we’ve also coded
gain real insight into what’s going on in your infrastructure, see security the sections with icons and colors for easy reading. Enjoy reaping the
threats or even use the insights to drive better business decisions. benefits of Splunk whatever your need.

By taking all the data involved in the process — i.e., pulling


information from order processing, middleware, interactive voice
response systems and Twitter — an organization can get a full view
of the customer experience problem.
Security and Compliance IT Ops, App Delivery
and DevOps

Internet of Things Business Analytics


Table of Contents Proxies...........................................................................................................................................29
VoIP.................................................................................................................................................29
SNMP..............................................................................................................................................30
User Data........................................................................................................6
Operating System Data........................................................................... 31
Virtual Private Networks (VPN).........................................................................................6
System Logs............................................................................................................................... 31
Authentication Data.................................................................................................................7
System Performance............................................................................................................ 32
Application Data...........................................................................................8
Virtual Infrastructure Data.................................................................... 33
Antivirus..........................................................................................................................................8
AWS Services............................................................................................................................33
Application Performance Management (APM) Tool Data.................................9
Google Cloud Platform (GCP)..........................................................................................34
Automation, Configuration, Deployment Tools (Platforms)............................9
Microsoft Azure.......................................................................................................................34
Binary Repositories............................................................................................................... 10
Pivotal Cloud Foundry (PCF)............................................................................................35
Build Systems (Platforms)................................................................................................. 10
VMware Server Logs, Configuration Data
Code Management................................................................................................................. 11
and Performance Metrics..................................................................................................36
Container Logs and Metrics............................................................................................. 11
Physical Infrastructure Data................................................................. 37
Container Orchestration Metrics .................................................................................. 12
Backup........................................................................................................................................... 37
CRM, ERP and Other Business Applications.......................................................... 13
Environmental Sensors.......................................................................................................38
Custom Application and Debug Logs......................................................................... 14
Industrial Control Systems (ICS)...................................................................................38
Distributed Tracing Tools................................................................................................... 15
Mainframe...................................................................................................................................39
Mail Server.................................................................................................................................. 16
Medical Devices......................................................................................................................39
Test Coverage Tools.............................................................................................................. 16
Metric Line Protocols...........................................................................................................40
Serverless Monitoring.......................................................................................................... 17
Patch Logs................................................................................................................................... 41
Vulnerability Scanning......................................................................................................... 18
Physical Card Readers......................................................................................................... 41
Middleware Data....................................................................................... 19 Point-of-Sale Systems (POS)...........................................................................................42
Application Server................................................................................................................. 19
RFID/NFC/BLE..........................................................................................................................43
Middleware.................................................................................................................................20
Sensor Data................................................................................................................................44
Mobile Device Data................................................................................................................ 20
Server Logs.................................................................................................................................45
Web Server................................................................................................................................. 21
Smart Meters.............................................................................................................................45
Network Data............................................................................................. 22 Storage..........................................................................................................................................46
Deep Packet Inspection Data.......................................................................................... 22 Telephony....................................................................................................................................46
DHCP.............................................................................................................................................. 23 Transportation.......................................................................................................................... 47
DNS.................................................................................................................................................. 23 Wearables.................................................................................................................................... 47
Endpoint....................................................................................................................................... 24
Additional Data Sources......................................................................... 48
Firewall.......................................................................................................................................... 24
Database......................................................................................................................................48
FTP................................................................................................................................................... 25
Business Service Transaction and Business Service
Intrusion Detection/Prevention.................................................................................... 25 Performance Data.................................................................................................................. 49
Load Balancer........................................................................................................................... 26 Human Resources..................................................................................................................50
Network Access Control (NAC)...................................................................................... 26 Social Media Feeds................................................................................................................50
Network Protocols................................................................................................................. 27 Third-Party Lists .................................................................................................................... 51
Network Routers..................................................................................................................... 28
Network Switches.................................................................................................................. 28
Virtual Private
Networks (VPN)

USER DATA
User Data
Use Cases: Security and Compliance

Examples: Citrix NetScaler Nitro, Citrix NetScaler IPFIX, Cisco

Virtual private networks (VPNs) are a way of building a secure


extension of a private network over an insecure, public one. VPNs
can be established either between networks, routing all traffic
between two sites, or between a client device and a network.
Network-to-network VPNs typically are created using strong
credentials such as certificates on each end of the connection.
Client-to-network VPNs rely on user authentication, which can
be as simple as a username and password. VPNs use network
tunneling protocols such as IPSec, OpenVPN plus SSL or L2TP
with cryptographically strong algorithms to scramble information
in transit and ensure end-to-end data integrity.

Use Cases
Security and Compliance: VPN logs help analyze users
coming onto the network. This information can be used in a
number of ways, including situational awareness, monitoring
foreign IP subnets, and compliance monitoring of browsers and
applications of connected hosts. VPN data can also help identify:

• Activities from different locations, such as changes in location


within a given amount of time.

• Access from risky countries or locations.

• User sessions at odd times, such as late evenings or weekends.

• User land speed violations.

• Abnormal frequency of sessions based on each user profile.

The Essential Guide to Data | Splunk 6


Authentication Data Use Cases
Security and Compliance: For security, authentication data
Use Cases: Security and Compliance, IT Operations, provides a wealth of information about user activity, such as
Application Delivery multiple login failures or successes to multiple hosts in a given
time window, activities from different locations within a given
USER DATA

USER DATA
Examples and Data Sources: Active Directory, LDAP, Identity amount of time, and brute force activities. Specifically:
Management, Single-Sign On
• Active Directory domain controller logs contain information
Authentication data provides insight into users and identity regarding user accounts, such as privileged account activity,
activity. Common authentication data sources include: as well as the details on remote access, new account creation
and expired account activity.
• Active Directory: A distributed directory in which
organizations define user and group identities, security • LDAP logs include a record of who, when and where users log
policies and content controls. in to a system and how information is accessed.

• Identity Management data shows access rights by user,


• LDAP: An open standard defined by the Internet Engineering
group and job title (e.g., CEO, supervisor or regular user). This
Task Force (IETF) and is typically used to provide user
data can be used to identify access anomalies that could be
authentication (name and password). It has a flexible directory
potential threats — for example, the CEO accessing a low-
structure that can be used for a variety of information such
level networking device or a network admin accessing the
as full name, phone numbers, email and physical addresses,
CEO’s account.
organizational units, workgroup and manager.
IT Ops and Application Delivery: Authentication data supports
• Identity Management: Identity management is the method
IT operations teams as they troubleshoot issues related to
of linking the users of digital resources — whether people, IoT
authentication. For example, application support can be tied
devices, systems or applications — to a verifiable online ID.
to logins, enabling IT operations to see whether users are
struggling to log in to applications. For IT operations teams that
• Single Sign-On (SSO): A process of using federated identity
support Active Directory, logs can be used to troubleshoot and
management to provide verifiable, attestable identities
understand the health of Active Directory.
from a single source to multiple systems. SSO significantly
increases security by tying user credentials to a single source,
allowing changes to user rights and account status to be
made once, and reflected in every application or service to
which the user has access. SSO is particularly important for
users with elevated security rights such as system or network
administrators that have access to a large number of systems.

The Essential Guide to Data | Splunk 7


Antivirus
Use Cases: Security and Compliance

APPLICATION DATA
Application Data
Examples: Kaspersky, McAfee, Norton Security, F-Secure, Avira,
Panda, Trend Micro

The weakest link in corporate security are individuals, and


antivirus is one way to protect them from performing
inadvertently harmful actions. Whether it’s clicking on an
untrustworthy web link, downloading malicious software or
opening a booby-trapped document (often one sent to them by
an unsuspecting colleague), antivirus can often prevent, mitigate
or reverse the damage.

So-called advanced persistent threats (APTs) often enter


through a single compromised machine attached to a trusted
network. While not perfect, antivirus software can recognize and
thwart common attack methods before they can spread.

Use Cases
Security and Compliance: Antivirus logs support the analysis
of malware and vulnerabilities of hosts, laptops and servers;
and can be used to monitor for suspicious file paths. It can
help identify:

• Newly detected binaries, file hash, files in the filesystem


and registries.

• When binaries, hash, or registries match threat intelligence.

• Unpatched operating systems.

• Known malware signatures.

The Essential Guide to Data | Splunk 8


Application Performance Automation,
Management (APM) Configuration,

APPLICATION DATA
Tool Data Deployment Tools
(Platforms)
APPLICATION DATA

Use Cases: Security and Compliance, IT Operations, Application Delivery

Examples: Dynatrace, New Relic, AppDynamics, MMSoft Pulseway, Use Case: Application Delivery and DevOps
LogicMonitor, Stackify, Idera, Ipswitch
Examples: Puppet Enterprise, Ansible Tower, Chef, SaltStack,
Application performance management software provides end-to- Rundeck machine data ingested through APIs, webhooks or
end measurement of complex, multitier applications to provide run logs
performance metrics from an end user’s perspective. APM logs also
provide event traces and diagnostic data that can assist developers Automated configuration and deployment tools, also known
in identifying performance bottlenecks or error conditions. The data as infrastructure as code, allow IT and DevOps practitioners
from APM software provides both a baseline of typical application to practice continuous application delivery in the cloud or on
performance and record of anomalous behavior or performance premises. When infrastructure is treated as code, it’s easy to
degradation. Carefully monitoring APM logs can provide an early share, collaborate, manage version control, perform peer unit
warning to application problems and allow IT and developers to testing, automate deployments, check the status of deployment
remediate issues before users experience significant degradation or and more.
disruption. APM logs also are required to perform post-hoc forensic
Tools like Rundeck are platforms that take automation
analysis of complex application problems that may involve subtle
frameworks like Salt Stack and enable teams to automate states
interactions between multiple machines, network devices or both.
or playbooks to make sure the code is released and reported
Use Cases back to a central reporting tool.
Security and Compliance: Security teams can use APM logs to
Use Cases
perform post-hoc forensic analysis of incidents that span multiple
Application Delivery and DevOps: Automation and configuration
systems and exploit vulnerabilities. The data can be used to
machine data monitoring helps application delivery teams deliver
correlate security indications between the system and application
applications faster without sacrificing stability or security.
activities. It also helps to identify SQL/API calls/CMD made in
relation to suspicious activity, or abnormal amounts of sessions or
CPU load in relation to security activity.

IT Ops and Application Delivery: By providing end-to-end


measurement of complex, multi tier applications, APM logs can
show infrastructure problems and bottlenecks that aren’t visible
when looking at each system individually, such as slow DNS
resolution causing a complex web app to bog down as it tries to
access content and modules on many different systems.

The Essential Guide to Data | Splunk 9


Binary Repositories Build Systems
Use Case: Application Delivery and DevOps (Platforms)
APPLICATION DATA

APPLICATION DATA
Examples: Data from Nexus, Artifactory, delivered through APIs, Use Case: Application Delivery and DevOps
webhooks; Yum, Pacman and Aptly data delivered through logs
Examples: Jenkins, Bamboo, TravicCI, TeamCity machine data
A binary repository is a tool for downloading and storing binary ingested through APIs, logs, webhooks
files used and created in software development. It’s used to store
software binary packages, artifacts and their corresponding Build platforms, like Jenkins and Bamboo, enable a continuous
metadata. They’re different from source code repositories, as integration practice that allows application delivery teams —
binary repositories do not store source files. Searching through including developers, DevOps practitioners, QA and release
these repositories is possible by analyzing associated metadata. engineering — to build artifacts, trigger new builds and
environments, automate tests and more.
Use Cases
Application Delivery and DevOps: Analyzing binary repository Use Cases
data helps application delivery teams and release managers Application Delivery and DevOps: Build systems monitoring
to ensure that the final deployment of code to production helps release managers, test and QA teams understand the
is successful. health of their build environment, the status of tests, get insights
into stack traces and build queues. This visibility helps remediate
build or test bottlenecks and increase the application delivery
velocity and quality.

The Essential Guide to Data | Splunk 10


Code Management Container Logs
Use Case: Application Delivery and Metrics
APPLICATION DATA

APPLICATION DATA
Examples: Github, GitLab Use Case: Application Delivery and DevOps

For all but the most trivial implementations, application source Examples: Docker
code consists of dozens if not hundreds of interrelated files.
The complexity and volatility of code — particularly when Container logs are an efficient way to acquire logs generated
using agile development methodologies and changes are made by applications running inside a container. By utilizing logging
daily — makes keeping track of it virtually impossible without a drivers, output that is usually logged is redirected to another
structured, automated source code management and revision target. Since logging drivers start and stop when containers
control system. start and stop, this is the most effective way of capturing
machine data, given the often limited lifespan of a container.
Originally built as client-server applications where developers
checked in code to a central repository, today’s systems (such Container metrics contain details related to CPU, memory, I/O
as Git) are often distributed, with each developer working from and network metrics generated by a container. By capturing
a local copy of the full repository and changes synchronized this data, you have the opportunity to spot specific containers
across all subscribers to a particular project. Code management that appear to consume more resources than others — enabling
systems provide revision control (the ability to back out changes faster, more precise troubleshooting.
to an earlier version), software build automation, configuration
Use Cases
status records and reporting, and the ability to branch or fork all
Application Delivery and DevOps: Acquiring container log
or part of a source-code tree into a separate subproject with its
files gives developers and operations teams insight on errors,
own versioning.
issues and availability of applications running inside containers.
Use Cases Logs and metrics at the container level also call attention
Application Delivery: The version records of code management to containers whose performance is outside of expected
can help IT operations teams identify application changes parameters. As a result, admins can “kill” or “stop” a container
that are causing system problems, such as excessive resource instance, and “run” a new container in its place.
consumption or interference with other applications.

The Essential Guide to Data | Splunk 11


Container Orchestration Use Cases
Application Delivery and DevOps: Acquiring Kubernetes
Metrics metrics gives developers and operations teams insights across
APPLICATION DATA

APPLICATION DATA
all layers of their Kubernetes environment and the underlying
Use Case: Application Delivery and DevOps infrastructure. This broad view helps operators monitor and
manage the health of containerized environments, oversee
Examples: Kubernetes, Amazon ECS2, Azure Container Services,
services migrating to Kubernetes, and quickly diagnose any
Docker Swarm, Google Container Engine
issues with the infrastructure, the orchestration platform itself,
Container orchestration tools provide an enterprise-level or the container.
framework for automating container deployments and
For example, operators can look into an under-performing pod
integrating and managing containerized applications at scale.
then to the metrics for the workload running in that pod and
Container orchestration tools like Kubernetes are important
view its neighbors allowing for more context than just container
for ensuring the speed, availability, scaling and networking of
level metrics and logs. Since particular problems in container
containerized environments. Like container metrics, it’s
environments can often be hard to find, this context is critical for
important to collect container orchestration metrics at high-
teams to correlate patterns — reducing mean time to clue and
resolutions due to their self-healing, ephemeral nature.
expediting root cause analysis. This is particularly helpful during
The most popular container orchestration platform is Kubernetes. troubleshooting when DevOps teams need to quickly pinpoint
Kubernetes metrics contain details related to the inventory, which service is causing a sudden spike in latency or error rate
health and performance of container resources (cluster maps, and why. This comprehensive view also assists with resource
node state, pod status, container status, namespace status, optimization and capacity planning.
workload deployments details, etc.) along with aggregated system
metrics (CPU, disk, memory, network) across nodes. By visualizing
and correlating this data, you have the opportunity to keep track
of infrastructure inventory, capacity, and cost and investigate
underlying issues across your Kubernetes environment leading
to failures — expediting troubleshooting.

The Essential Guide to Data | Splunk 12


CRM, ERP and Other ERP software also integrates with CRM, HR, finance/accounting/
payroll and asset management systems, with bidirectional

Business Applications data flows that provide consistent information across back-
end digital business processes. ERP systems are typically built
Use Cases: Security and Compliance, Application Delivery, on a relational database management system with a variety
Business Analytics of modules and customizations for specific functions such as
supplier relationship management or supply chain management.
APPLICATION DATA

APPLICATION DATA
Examples: SAP, SFDC, SugarCRM, Oracle, Microsoft Dynamics Due to their complexity, ERP systems often are installed and
managed by product specialists.
Business Applications can create a wealth of data as part of
normal operations. Two examples are CRM and ERP applications: Use Cases
Security and Compliance: CRM records can help security teams
Customer relationship management (CRM) systems have
unravel incidents that involve multiple customers and problem
become an essential part of every organization, providing
episodes over a long time span. They can also provide evidence
a central database of all customer contact information,
of a breach, should records be modified outside normal business
communications and transaction details. CRM systems have
processes. In addition, the data can be used to audit access
evolved from simple contact management systems to platforms
records of customer or internal user information.
for customer support and engagement by providing personalized
sales and support information. The same customer support Application Delivery: CRM, ERP, and other business applications
data repository can be used to develop customized marketing are often mission-critical systems that facilitate a variety of
messages and sales promotions. CRM systems are also useful for front and back office processes. The performance of these
application support and enhancement by recording details about applications can impact internal operations. Business application
customer problems with a particular system or application along logs can be used to determine the health of those operations.
with their eventual solution — details that can inform future
application or service updates. Business Analytics: CRM, ERP, and other business applications
facilitate a variety of front and back office processes that span
Enterprise resource planning (ERP) applications are a critical other systems as well. As part of an end-to-end view of those
back-office IT service that provides systematic, automated complex business processes, business application data can help
collection and analysis of a variety of product, supply chain provide insights into the health of business operations.
and logistics data. ERP is used in product planning, tracking
purchases of components and supplies, inventory management,
monitoring and regulating manufacturing processes, managing
logistics, warehouse inventory and shipping, and to monitor and
measure the effectiveness of sales and marketing campaigns.

The Essential Guide to Data | Splunk 13


Custom Application Use Cases
Security and Compliance: Security breaches are often the
and Debug Logs result of improper handling of unexpected inputs, such as buffer
overflow exploits or data injection used in cross-site scripting
Use Cases: Security and Compliance, IT Operations, attacks. This type of low-level vulnerability is almost impossible
APPLICATION DATA

APPLICATION DATA
Application Delivery to detect without logging the internal state of various application
variables and buffers.
Examples: Custom applications
Similar to APM logs, custom application and debug logs can be
Best practices for application developers require the inclusion of
used to correlate security indications between the system and
debugging code in applications that can be enabled to provide
application activities. It also helps to identify SQL/API calls/CMD
minute details of application state, variables and error conditions
made in relation to suspicious activity, or abnormal amounts of
or exceptions. Debug output is typically logged for later analysis
sessions or CPU load in relation to security activity.
that can expose the cause of application crashes, memory leaks,
performance degradation and security holes. Furthermore, since IT Ops and Application Delivery: Debug output can expose
the events causing a security or performance problem may be application behavior that causes inefficient use of system
spaced over time, logs — along with the problem software — can resources or application failures that can be addressed by
help correlate and trace temporally separated errors to show developers and operations teams. Debug output is useful for
how they contribute to a larger problem. unraveling the internal state of an application that exhibits
performance problems or has been shown to have security
Application debug logs provide a record of program behavior
vulnerabilities, and the data can be helpful in identifying
that is necessary to identify and fix software defects, security
root cause.
vulnerabilities or performance bottlenecks. While test logs
record the output results of application usage, debug logs
provide information about an application’s internal state,
including the contents of variables, memory buffers and
registers; a detailed record of API calls; and even a step-by-
step trace through a particular module or subroutine. Due to
the performance overhead and amount of data produced,
debug logs typically are enabled only when a problem can’t be
identified via test or event logs.

The Essential Guide to Data | Splunk 14


Distributed Use Cases
IT Operations, Application Delivery and DevOps: By providing
Tracing Tools end-to-end measurement of complex, multi-tier applications,
APPLICATION DATA

APPLICATION DATA
tracing data can show microservices problems and bottlenecks
Use Case: IT Operations, Application Delivery and DevOps that aren’t visible when looking at each application individually,
especially through service mapping, such as slow DNS resolution
Examples: SignalFx, OpenTelemetry, Zipkin, Jaeger, fluentd
causing a complex web app to bog down as it tries to access
Distributed tracing is a method used to monitor how requests content and modules on many different systems.
flow through your microservices applications by mapping
Distributed tracing allows DevOps teams to see all traces
transaction paths and duration as they propagate across
and spans for an API call and fix underperforming APIs. This
services through trace and span data.
helps teams improve system performance in real-time, before
Popular open source distributed tracing instrumentation tools downstream effects impact customers. APM tools can expose
like OpenTelemetry record and publish operation data useful which transaction spans deviated from the norm while showing
for finding sources of latency and errors within a distributed correlation to code and infrastructure for deeper root cause
system — illuminating the relationship between user-visible analysis and troubleshooting. Since teams can visualize tracing
behavior and the complex mechanics of the microservices data in real-time, this information improves time to market by
underneath. APM software tools metricize information collected making it easy to immediately see how updates and rollouts to
through these instrumentation tools to provide actionable services impact applications.
insights on performance problems drilling down into specific
service-level details.

Traces contain a lot of information about the method, operation,


or block of code that it captures like the operation name, the
start time of the operation, how long the operation took to
execute, the logical name of the service on which the operation
took place, the IP address of the service instance on which the
operation took place, and trace context propagation. These are
often represented as RED (Rate, Errors, Duration) metrics for
monitoring purposes.

Distributed tracing along with APM Tools provide a context rich,


complete view of service transactions that exist in complex
distributed systems so IT and developers can understand
user-visible latency, SLAs, and perform root-cause analysis with
preserved traces that serve as anomaly benchmarks.

The Essential Guide to Data | Splunk 15


Mail Server Test Coverage Tools
Use Cases: Security and Compliance, IT Operations Use Case: Application Delivery and DevOps

APPLICATION DATA
Examples: Exchange, Office 365 Examples: Static Analysis and Unit Testing logs (SonarQube, Tox,
APPLICATION DATA

PyTest, RubyGem MiniTest, Bacon, Go Testing), build server logs


Email remains the primary form of formal communication in and performance metrics
most organizations. As such, mail server databases and logs are
some of the most important business records. Due to their size Typical test coverage includes functional, statement, branch
and tendency to grow without bounds, email data management and conditional coverage. The idea is to match what percentage
typically requires both data retention and archival policies so that of code can be exercised by a test suite of one or more
only important records are held and inactive data is moved to coverage criteria. Coverage tests are usually defined by rule or
low-cost storage. requirements. In addition to coverage testing, software delivery
teams can utilize machine data to understand the line count,
Use Cases code density and technical debt.
Security and Compliance: Mail server data can help identify
malicious attachments, malicious domain links and redirects, Use Cases
emails from known malicious domains, and emails from unknown Application Delivery and DevOps: Test coverage data
domains. It can also be used to identify emails with abnormal or monitoring helps release managers, application owners and
excessive message sizes, and abnormal email activities times. others understand:

IT Ops: Email messages and activity logs can be required • How much technical debt and issues are they resolving?
to maintain compliance with an organization’s information • How ready is their next release?
security, retention and regulatory compliance processes. Mail
• From unit testing — how many tests were performed per hour
server transaction and error logs also are essential debugging
and what tests are being run?
tools for IT problem resolution and also may be used for
usage-based billing.
If test coverage data is combined with build data, release
managers can start monitoring build and release performance
and start understanding the release quality. They can understand
the trends in error percentage and make decisions on if the build
is ready for production. Understanding code quality can also help
support teams get prepared for any additional volume of calls or
any particular issues that may arise.

The Essential Guide to Data | Splunk 16


Serverless Monitoring Use Cases
Application Delivery and DevOps: Serverless monitoring helps
Use Case: Application Delivery and DevOps DevOps teams, application owners and others understand:
APPLICATION DATA

APPLICATION DATA
Examples: AWS Lambda, Google Cloud Functions (GCF), Azure • Availability of applications running on serverless with point in
Functions, OpenShift Serverless time information about current state of functions like average
latency and total number of function cold starts.
Event-driven, serverless computing platforms also known as
• Usage on concurrency for availability and cost planning. Teams
functions-as-a-service (FaaS) allow IT and DevOps practitioners
can increase the amount of concurrency during times of high
to practice continuous application delivery without the need
demand and lower it, or completely turn it off, when demand
to perform administrative tasks required to provision and
decreases in real-time.
manage infrastructure. With FaaS, developers write single-
purpose functions that are triggered and scaled on demand • Errors with visibility and insights into failed invocations so
by events emitted from services so teams can focus on writing developers can remediate issues before users are impacted.
and delivering business critical applications. It makes it easy • Compute duration — time from when your function code
to automate processes, control costs, autoscale services starts executing as the result of an invocation to when it stops
and APIs, and promote collaboration across teams writing executing for deeper understanding into costs.
specialized applications in different languages. However, the
• How functions are supporting business and customer
“statelessness” and ephemerality of functions make monitoring
experience including user requests, checkout abandonment,
their performance almost impossible without real-time,
revenue per location, etc.
contextual solutions.
• Trends and breakdowns of functions by account, region, etc.
for deeper root cause analysis.

Data from functions can also be monitored via distributed


tracing for granular visibility into the performance of serverless
applications along with end-to-end transaction views into
invocations of multiple functions and all services.

The Essential Guide to Data | Splunk 17


Vulnerability Scanning Use Cases
Security and Compliance: Vulnerability scans yield data about
Use Case: Security and Compliance open ports and IP addresses that can be used by malicious
APPLICATION DATA

APPLICATION DATA
agents to gain entry to a particular system or entire network.
Examples: ncircle IP360, Nessus The data can used to identify:

An effective way to find security holes is to examine • System misconfiguration causing security vulnerability.
infrastructure from the attacker’s point of view. Vulnerability
• Outdated patches.
scans probe an organization’s network for known software
defects that provide entry points for external agents. These • Unnecessary network service ports.
scans yield data about open ports and IP addresses that can be • Misconfigured filesystems, users or applications.
used by malicious agents to gain entry to a particular system or
• Changes in system configuration.
entire network.
• Changes in various user, app or filesystem permissions.
Systems often keep network services running by default, even
when they aren’t required for a particular server. These running,
unmonitored services are a common means of external attack,
as they may not be patched with the latest OS security updates.
Broadscale vulnerability scans can reveal security holes that
could be leveraged to access an entire enterprise network.

The Essential Guide to Data | Splunk 18


Application Server
Use Cases: Security and Compliance, IT Operations,
Application Delivery

Middleware Data

MIDDLEWARE DATA
Examples: log4j, log4php

Whether building a multi-tier web application or using a traditional


client-server design, application servers run the backend
software that handles user requests. Today, these are typically
deployed as virtual machines on a multi-tenant hypervisor.

Use Cases
Security and Compliance: Security breaches are often the
result of improper handling of unexpected inputs, such as buffer
overflow exploits or data injection used in cross-site scripting
attacks. This type of low-level vulnerability is almost impossible
to detect without logging the internal state of various application
variables and buffers. Since the events causing a security or
performance problem may be spaced over time, logs, along with
the problem software, can help correlate and trace temporally
separated errors to show how they contribute to a larger
problem. Anomalies in the logs can indicate potential failures or
compromised attempts. The data can also help:

• Monitor user or customer transactions.

• Identify abnormal volume/amount/session of transactions.

• Identify unknown user interaction with third accounts,


users or both.

• Sequence the exact transaction patterns matching


fraudulent profiles.

IT Ops and Application Delivery: The value of application server


logs depends on what they collect; however, these may include
customer information useful in troubleshooting or application
state transitions similar to, but less verbose than debug output
that can provide clues to application crashes, memory leaks and
performance problems.

The Essential Guide to Data | Splunk 19


Middleware Mobile Device Data
Use Cases: Security and Compliance, IT Operations, Use Cases: Security and Compliance, IT Operations,
Application Delivery Application Delivery
MIDDLEWARE DATA

MIDDLEWARE DATA
Examples: Tibco, Software AG, Apache Active MQ, Kafka, Given the array of always-active sensors on mobile devices, they
AMQP, MQTT are veritable gushers of data that can include:

Middleware describes a software layer of the prototypical • Physical parameters such as location, network MAC ID, device
three-tier enterprise application that typically implements GUID, device type and OS version.
data transformations, analysis and business logic. Middleware • Network settings such as address, AP or cell-base station
accesses databases for persistent storage and relies on web location, link performance.
apps for the user interface. Middleware is often developed on
• Application-specific telemetry such as time in app, features
the J2EE platform.
used and internal state and debug parameters similar to those
Use Cases provided by conventional application servers.
Security and Compliance: Since middleware generally accesses
network services and sensitive databases, security teams can
Use Cases
Security and Compliance: Security teams can expand
use log data to vet application integrity, identify suspicious
the threat landscape by monitoring mobile device data for
behavior and specific vulnerabilities. It can also be used for user
abnormal activity in regards to authentication, location and
and customer transaction monitoring and to identify abnormal
application usage.
transactions, unknown user interaction with third party accounts,
and the sequence of exact transaction patterns that match
IT Ops and Application Delivery: Since mobile apps invariably
known fraudulent profiles.
connect to one or more backend services, data from the client’s
point of view can provide insight into the app’s condition and
IT Ops and Application Delivery: Middleware data can help
state when investigating issues such as crashes, performance
operations teams diagnose problems with three-tier applications
degradation or security leaks. Mobile data shows the sequence
that involve the interaction between web, middleware and
of events and the application conditions leading up to and during
database servers.
a problem. If the source of the problem is the mobile application
itself, getting insight on mobile application data can help
developers deliver a better performing mobile app.

The Essential Guide to Data | Splunk 20


Web Server and seldom used, those that users have trouble with and areas for
future enhancement. For customer-facing applications, usage logs
Use Cases: Security and Compliance, IT Operations, provide sales and marketing teams insight into the effectiveness of
Application Delivery online and app-based sales channels and promotions, data about
sell-through and transaction abandonment, and information for
MIDDLEWARE DATA

Examples: Java J2EE, Apache, Application Usage Logs, IIS logs, nginx potential cross-sales promotions.

MIDDLEWARE DATA
Web servers are the backend application behind every website Use Cases
that delivers all content seen by browser clients. Web servers Security and Compliance: Web logs record error conditions such as
access static HTML pages and run application scripts in a variety of a request to access a file without appropriate permissions and also
languages that generate dynamic content and call other applications track user activity that can flag security attacks such as attempted
such as middleware. unauthorized entry or DDoS. It can also help to identify SQL injections
and support correlating fraudulent transactions.
Web servers can vary widely, and can include:
• Since Java apps frequently access network services and sensitive
• Java – J2EE: Java is the most popular programming language due databases, security teams can use log data to vet the integrity
to its versatility, relative ease of use and rich ecosystem of developer of J2EE apps, identify suspicious application behavior and
tools. Via the J2EE platform, which includes APIs, protocols, SDKs application vulnerabilities.
and object modules, Java is widely used for enterprise apps including
• Apache web logs can alert to security attacks such as attempted
web applets, middle-tier business logic and graphic front ends. Java
unauthorized entry, XSS, buffer overflows or DDoS.
is also used for native Android mobile apps.
• Like web logs, generic application usage logs can alert security
• Apache: Apache is one of the oldest and most-used web servers teams to unauthorized access such as someone consuming more
on the internet, powering millions of enterprise, government and resources than normal, or using applications at odd hours.
public sites. Apache keeps detailed records of every transaction:
every time a browser requests a web page, Apache log details IT Ops and Application Delivery: Web logs are critical in debugging
include items such as the time, remote IP address, browser type both web application and server problems, but also are used to
and page requested. Apache also logs various error conditions such generate traffic statistics that are useful in capacity planning. Web
as a request for a missing file, attempts to access a file without server data can provide varying information for IT operations teams:
appropriate permissions or problems with an Apache plug-in
• J2EE data can help operations teams diagnose problems with
module. Apache logs are critical in debugging both web application
three-tier applications that involve the interaction between web,
and server problems, but are also used to generate traffic statistics,
middleware and database servers.
track user behavior and flag security attacks such as attempted
unauthorized entry or DDoS. • In aggregate, Apache web logs can show activity of a web service.
Drilling into details can reveal infrastructure bottlenecks and
• Application Usage Logs: Like Apache web logs, collecting indicate downstream issues.
application usage can provide valuable information to multiple
• Application usage logs can help IT operations teams with
stakeholders including developers, IT, sales and marketing.
infrastructure capacity planning, optimization, load balancing
Depending on how granular the measurement, usage tracking can
and usage-based billing by providing detailed records of
assist developers in identifying application features that are most
resource consumption.

The Essential Guide to Data | Splunk 21


Deep Packet
Inspection Data
Network Data
Use Cases: Security and Compliance, IT Operations

NETWORK DATA
Examples: Stream, PCAP, bro

Deep Packet Inspection Data (DPI) is a fundamental technique


used by firewalls to inspect headers and the payload of network
packets before passing them down the network subject to
security rules. DPI provides information about the source and
destination of the packet, the protocol, other IP and TCP/UDP
header information and the actual data.

Use Cases
Security and Compliance: Packet Capture logs (PCAP) see
everything traversing a network and are required to identify
security attacks and incidents such as advanced persistent
threats, data exfiltration, DDoS and malware. DPI also can be
used to filter content subject to an organization’s terms of
service. PCAP data can also be used to provide and identify:

• DNS session analysis for malicious domain communications


from each endpoint.

• Abnormal amounts of traffic or sessions.

• Abnormal amounts of domain and host communications.

• Known malicious traffic from a host.

• Expired SSL certification analysis.

• Abnormal host communications (internal and external).

IT Ops: Data on the network wire is authoritative and difficult


to spoof (although encryption, steganography and advanced
deception techniques can evade DPI). For example, DPI provides
raw information of everything transmitted over a network,
including things that aren’t necessarily part of or difficult to
extract from a log, such as database query results.

The Essential Guide to Data | Splunk 22


DHCP DNS
Use Cases: Security and Compliance, IT Operations Use Cases: Security and Compliance, IT Operations

Examples: DHCP Insight, Linux DHCP Examples: BIND, PowerDNS, Unbound, Dnsmasq, Erl-DNS
NETWORK DATA

NETWORK DATA
DHCP is the network protocol most client devices use to The domain name system (DNS) is the internet’s phone book,
associate themselves with an IP network. Implemented via a providing a mapping between system or network resource
DHCP server, which could be standalone or embedded in a router names and IP addresses. DNS has a hierarchical name space
or other network appliance, DHCP provides network clients with that typically includes three levels: a top-level domain (TLD) such
critical network parameters including IP address, subnet mask, as .com, .edu or .gov; a second-level domain such as “google”
network gateway, DNS servers, WINS or other name servers, time or “whitehouse;” and a system level such as “www” or “mail.”
servers (NTP), a host and domain name and the address of other DNS nameservers operate in this hierarchy either by acting as
optional network services. authoritative sources for particular domains, such as a company
or government agency, or by acting as caching servers that store
Use Cases DNS query results for subsequent lookup by users in a specific
Security and Compliance: DHCP logs show exactly which location or organization; for example, a broadband provider
systems are connecting to a network, their IP and MAC caching addresses for its customers.
addresses, when they connect and for how long. This information
is useful in establishing the state of a network when a security Use Cases
incident occurs and tracing an attacker’s address back to a time Security and Compliance: Security teams can use DNS logs to
of access and type of device by looking at the MAC ID and vendor investigate client address requests such as correlating lookups
identification string. The data can also be used to support user with other activity, whether requests are made for inappropriate
network access verification. or otherwise suspicious sites and relative popularity of individual
sites or domains. Since DNS servers are a frequent target of
IT Ops: DHCP logs can be used when troubleshooting a client DDoS attacks, logs can reveal an unusually high number of
device that is having network problems, since it provides a requests from external sources. Likewise, since compromised
definitive record of the device’s primary IP parameters. The data DNS servers themselves are often used to initiate DDoS
may show that the DHCP server itself is at fault; for example, by attacks against other sites, DNS logs can reveal whether an
not properly vending addresses, renewing IP leases or giving the organization’s servers have been compromised. DNS data can
same address to two separate devices. also provide detection of unknown domains, malicious domains
and temporary domains.

IT Ops: DNS server logs provide operations teams with a record


of traffic, the type of queries, how many are locally resolved
either from an authoritative server or out of cache, and a picture
of overall system health.

The Essential Guide to Data | Splunk 23


Endpoint Firewall
Use Case: Security and Compliance Use Cases: Security and Compliance, IT Operations
NETWORK DATA

Examples: McAfee ePO, Symantec SEP Examples: Palo Alto, Cisco, Check Point

NETWORK DATA
Endpoint security is used to protect corporate networks from Firewalls demarcate zones of different security policies.
inadvertent attacks by compromised devices using untrusted By controlling the flow of network traffic, firewalls act as
remote networks such as hotspots. By installing clients on gatekeepers collecting valuable data that might not be captured
laptops or other wireless and mobile devices, endpoint security in other locations due to the firewall’s unique position as the
software can monitor activity and provide security teams with gatekeeper to network traffic. Firewalls also execute security
warnings of devices attempting to spread malware or pose policy and thus may break applications using unusual or
other threats. unauthorized network protocols.

In this context, endpoint refers to the security client software Use Cases
or agent installed on a client device that logs security-related Security and Compliance: Firewall logs provide a detailed
activity from the client OS, login, logout, shutdown events and record of traffic between network segments, including source
various applications such as the browser (Explorer, Edge), mail and destination IP addresses, ports and protocols, all of which
client (Outlook) and Office applications. Endpoints also log their are critical when investigating security incidents. The data may
configuration and various security parameters (certificates, local also reveal gaps in security policy that can be closed with tighter
anti-malware signatures, etc.), all of which is useful in post-hoc construction of firewall rules. Firewall data can help identify
forensic security incident analysis. and detect:

Use Cases • Lateral movement


Security and Compliance: Endpoint data can be used for a • Command and Control traffic
variety of security uses, including identifying newly detected
• DDoS traffic
binaries, file hash, files in the filesystem, and registries. It can also
help with identifying binary and hash registries that match threat • Malicious domain traffic
intelligence, as well as unpatched operating systems and binaries, • Unknown domain traffic
and to detect known malware.
• Unknown locations traffic

IT Ops: When network applications are having communication


problems, network security policies may be the culprit. Firewall
data can provide visibility into which traffic is blocked and which
traffic has passed through — helping identify if you have an app
or network issue.

The Essential Guide to Data | Splunk 24


FTP Intrusion Detection/
Use Cases: Security and Compliance, IT Operations Prevention

NETWORK DATA
Examples: OSSEC, Getwatchlist, UTBox, Security Onion, Use Case: Security and Compliance
NETWORK DATA

iSeries - AS400, Traffic Ray


Examples: Tipping Point, Juniper IDP, Netscreen Firewall,
FTP is one of the oldest and most rudimentary network protocols Juniper NSM IDP, Juniper NSM, Snort, McAfee IDS
for copying data from one system to another. Before websites
and HTTP, FTP was the best way to move large files across the IDS and IPS are complementary, parallel security systems that
internet. FTP is still used in organizations that need reliable, supplement firewalls — IDS by exposing successful network
deterministic internet file transfer. and server attacks that penetrate a firewall, and IPS by providing
more advanced defenses against sophisticated attacks. IDS
Use Cases is typically placed at the network edge, just inside a perimeter
Security and Compliance: Analyzing FTP servers can help firewall, although some organizations also put a system outside
security teams identify when compromised credentials are the firewall to provide greater intelligence about all attacks.
used, when abnormal traffic is coming from different locations Likewise, IPS is typically placed at the network perimeter,
or at odd times, and when sensitive files and documents are although it also may be used in layers at other points inside the
being accessed. network or on individual servers. IPS usually works by dropping
packets, resetting network connections and blacklisting specific
IT Ops: FTP traffic logs record the key elements of a file IP addresses or ranges.
transmission, including source (client) name and address and
remote user name if the destination is password-protected. This Use Cases
and other data are crucial when troubleshooting FTP problems, Security and Compliance: IDS logs provide security teams
regardless of the application. detailed records of attacks including the type, source,
destination and port(s) used that provide an overall attack
signature. Special signatures may trigger alarms or other
mitigating actions. IPS provide the same set of attack signature
data, but also may include a threat analysis of bad network
packets and detection of lateral movement. This data can also
detect command and control traffic, DDoS traffic, and malicious
or unknown domain traffic.

The Essential Guide to Data | Splunk 25


Load Balancer Network Access
Use Cases: IT Operations Control (NAC)

NETWORK DATA
NETWORK DATA

Examples: Local Traffic Manager, Cisco Load Balancer, Citrix, Use Case: Security and Compliance
Kemp Technologies, Radware AppDirector OnDemand
Examples: Aruba ClearPass, Cisco ACS
Load balancers allocate external network traffic bound for a
particular server or application across multiple redundant Network access or admission control is a form of client/
instances. There are two categories of load balancer: local, in endpoint security that uses a locally installed software agent
which all resources in a load-balanced pool are on the same to pre-authorize connections to a protected network. NAC
subnet; and global or distributed, where the resource pool is screens client devices for contamination by known malware and
spread across multiple sites. Load balancers use several user- adherence to security policies such as running an approved OS
selectable algorithms to allocate traffic including: with the most recent patches. Clients failing NAC screens are
rerouted to an isolated quarantine network until any detected
• Round robin (systems get an equal number of connections problems are corrected.
allocated sequentially).
Use Cases
• Weighted round robin (where the load is assigned according to
Security and Compliance: NAC software collects data about
the percentage weight assigned each system in a pool).
the connecting clients such as an inventory of installed client
• Least connections (where new connections go to the system software, compliance with security policies, OS and application
with the fewest number of existing clients). patch versions, accessibility by remote access clients and user
• Weighted least connections (where the connection handling access to protected networks. NAC logs provide security teams
capacity of each system is taken into account when with a detailed profile of a client’s state and activity. It can
determining the least busy system for new connections). provide details into unauthorized device connections and be
used to correlate users/IP to a physical network location.
• Random (connections are randomly assigned to each member
of a pool).

Use Cases
IT Ops: Load balancer logs provide operations teams with a
record of overall traffic to systems or particular applications and
provide indicators of each system’s traffic-handling capacity and
health, along with the status and health of the load balancer itself.

The Essential Guide to Data | Splunk 26


Network Protocols Use Cases
Security and Compliance: Network protocols are an important
Use Cases: Security and Compliance, IT Operations source for identifying advanced persistent threats, analyzing
traffic flows for unusual activity and identifying potential data
Examples: HTTP, Cisco NetFlow, Ntop, Flow-tools, FlowScan, exfiltration. Aggregating and analyzing flow records also can
NETWORK DATA

NETWORK DATA
EHNT, BPFT show anomalous traffic patterns and flow destinations that
are indicative of a breach, such as an APT phoning home to a
Network protocols describe the structure of data that flows
command and control server for instructions, additional malware
through networks. In most cases, network ports are assigned to
code, or copying large amounts of data to an attacker’s system.
specific protocols for both security and performance reasons.
The data can also be used to detect traffic related to DDoS,
Some protocols operate at a lower level of the computing stack
malicious domains, and unknown domains or locations.
and are used to direct packet routing, such as TCP, UDP or IP.
Other protocols, such as HTTP, HTTPS and TNS describe how IT Ops: Network protocol traffic analysis can help determine
packets are structured for applications — such as web services, the network’s role in overall availability and performance of
databases and a wide range of client-based applications. By critical services. Application traffic can be monitored for usage,
capturing, decrypting and analyzing network protocol data, performance, availability and can provide visibility into specific
you can better understand the kinds of applications, their user data. For applications that cannot be instrumented on
usage, performance and even payload (content of the data) the servers, network traffic may be the only way to acquire
of applications. Since this data can be gathered directly from performance data.
a network tap, or with specialized software, it provides a
perspective on applications and how they interoperate that may
not be otherwise available.

The Essential Guide to Data | Splunk 27


Network Routers Network Switches
Use Cases: Security and Compliance, IT Operations Use Cases: Security and Compliance, IT Operations

Examples: Routers from Cisco, Juniper, Linksys, Arista, Examples: Ethernet Switch, Virtual Switches
NETWORK DATA

NETWORK DATA
Extreme Networks, Avaya
Switches are network intersections, places where packets
If switches are network intersections, then routers are the signal move from one network segment to another. In their purest
lights and traffic cops — the devices responsible for ensuring form, switches work within a particular IP subnet and can’t route
that traffic goes to the right network segment. Unlike switches Layer 3 packets to another network. Modern data center designs
that operate at Layer 2, routers work at Layer 3, directing traffic typically use a two-tier switch hierarchy: top-of-rack (ToR)
based on TCP/IP address and protocol (port number). Routers switches connecting servers and storage arrays at the edge
are responsible for particular Layer 3 address spaces and and aggregation or spine switches connecting to the network
manage traffic using information in routing tables and configured core. Although ethernet switches are far more widespread, some
policies. Routers exchange information and update their organizations also use fiber channels or infiniband for storage
forwarding tables using dynamic routing protocols. area networks or HPC interconnects, each of which has its own
type of switch.
Use Cases
Security and Compliance: Routers collect the same sort of Use Cases
traffic logs and statistics as switches; thus, their data is equally Security and Compliance: Switch data, often captured as
valuable to security teams as a source for flagging advanced NetFlow records, is a critical data source for flagging advanced
persistent threats, analyzing traffic flows for unusual activity and persistent threats, analyzing traffic flows for unusual activity
identifying potential data exfiltration. As a wire-level data source, and identifying potential data exfiltration. As a wire-level data
router statistics are almost impossible to spoof and thus a critical source, switch statistics are almost impossible to spoof and thus
source of security data. Router data can also be used to detect a crucial source of security data. This data can also be used to
configuration changes, and error or failure alerts correlating with correlate users or IP addresses to a physical network location.
security indicators.
IT Ops: Operations teams use switch logs to see the state of
IT Ops: Network engineers use router logs and statistics to traffic flow, such as source and destination, class of service and
monitor traffic flow and ensure that traffic is being correctly causes of congestion. Logs also can show traffic statistics in the
forwarded between network segments. Data from routing aggregate, by port and by client, and whether particular ports are
protocol updates can show whether your routers are congested, failing or down.
appropriately exchanging route tables with other locations, that
external traffic can reach you, and that internal traffic is correctly
forwarded to external routers.

The Essential Guide to Data | Splunk 28


Proxies VoIP
Use Cases: Security and Compliance, IT Operations Use Cases: Security and Compliance, IT Operations

Examples: Blue Coat, Fortinet, Juniper IDP, Netscreen Firewall, Examples: Asterisk CDR, Asterisk event, Asterisk messages
NETWORK DATA

NETWORK DATA
Palo Alto Networks, Palo Alto Networks config, Palo Alto Networks
system, Palo Alto Networks threat, Palo Alto Networks traffic, nginx Voice over IP protocol refers to several methods for transmitting
real-time audio and video information over an IP-based data
Network proxies are used in several ways in IT infrastructure: network. Unlike traditional phone systems using dedicated, point-
as web application accelerators and intelligent traffic direction, to-point circuits, VoIP applications use packet-based networks
application-level firewalls, and content filters. By acting as a to carry real-time audio streams that are interspersed with other
transparent ‘bump-in-the-wire’ intermediary, proxies see the entire ethernet data traffic. Since TCP packets may be delivered out of
Layer 7 network protocol stack, which allows them to implement order due to data loss and retransmission, VoIP includes features
application-specific traffic management and security policies. to buffer and reassemble a stream. Similarly, VoIP packets are
usually tagged with quality of service (QoS) headers to prioritize
Use Cases their delivery through the network.
Security and Compliance: Security teams are interested in
proxies as application-layer firewalls. Here, proxy records can Use Cases
identify details about specific content traversing network control Security and Compliance: VoIP deployments may expose
points including file names, types, source and destination, and organizations to potential security threats, and analyzing VoIP
metadata about the requesting client such as OS signature, logs can help identify and prevent these exploits.
application and username/ID (depending on the proxy
implementation). The data can also be used to help detect IT Ops: VoIP logs provide troubleshooting and usage data similar
command and control traffic, malicious domain traffic and to that of other network applications. Details include source,
unknown domain traffic. destination, time and duration of calls, call quality metrics
(e.g., packet loss, latency, audio fidelity/bit rate) and any error
Web proxies and some next generation firewalls may act in conditions. Integrating VoIP source/destination records with an
a transparent or explicit mode communicating with HTTP(S) employee database such as AD or LDAP and a DHCP database
servers on behalf of a client. Using a number of related allows linking call records to actual people and IP addresses to
technologies, the request and response can be inspected and physical locations; information that can assist in troubleshooting
permitted, or blocked, based on user role, site or resource and billing.
category or attack indicator. Data logged in the events can
potentially be used in detective correlation.

IT Ops: Operations teams often use proxies embedded in an


application delivery controller (ADC), a more advanced, Layer
7-aware version of a load balancer. In this context, proxy logs
can provide information about incoming requests and traffic
distribution among available resources.

The Essential Guide to Data | Splunk 29


SNMP Use Cases
Security and Compliance: SNMP traps and alerts from network
Use Cases: Security and Compliance, IT Operations devices can help security teams identify abnormal activity over
the network. SNMP Polling helps a security analyst to see the
Examples: LogicMonitor, ManageEngine, Spiceworks, data transmission rates for a network-connected device that is
NETWORK DATA

NETWORK DATA
Ruckus Idera, Ipswitch suspected of malicious activity.

The simple network management protocol (SNMP) is one of the The data can also help identify abnormal amounts of traffic to
oldest, most flexible and broadly adopted IP protocols used for a certain site or domain, an abnormal amount of specific SNMP
managing or monitoring networking devices, servers and virtual traps from a certain host, and an abnormal number of unique
appliances. This includes network devices such as routers and SNMP traps from hosts compared to normal profiles.
switches, as well as non-networking equipment such as server
hardware or disk arrays. IT Ops: SNMP data can provide current information about
performance, configuration and current state. This allows the
SNMP supports two different methods of obtaining data. monitoring of the “normal” state of the environment, which is vital
when using a service-level approach to monitoring the health
• SNMP Traps are essentially alerts, set to send an alert on a
of any environment. This could include current speed of all of
state change, critical threshold, hardware failure, and more.
the ports on a switch, the number of bytes sent (per port or in
Traps are initiated by the SNMP device, and the trap is sent to
aggregate) through a router, the CPU temperature of a server,
an SNMP collector.
and any other information made available by the vendor per the
• SNMP Polling is an interactive query/response approach. SNMP MIBs for that device.
Unlike traps, polling is initiated by the SNMP collector in the
form of a request for certain, or all, SNMP data available on the Many environments rely on SNMP traps for alerting when a
SNMP device. critical state is reached (e.g., CPU temperature is critical) or when
a failure occurs (e.g., RAID disk failure). SNMP traps are not only
Although many now provide vendor-specific APIs for remote sent by devices to monitoring systems, in some environments
management and data collection, SNMP is still valuable in SNMP traps are the de-facto method for multiple monitoring and
troubleshooting due to its ubiquity (nearly every device supports alerting systems to aggregate errors to a single console.
it) and inherently centralized design (a single instance of SNMP
management software can collect data from every device on an
internal network, even across route domains).

The Essential Guide to Data | Splunk 30


System Logs
Use Cases: Security and Compliance, IT Operations,
Application Delivery

Operating

OPERATING SYSTEM DATA


Examples: Unix, Windows, Mac OS, Linux

System Data
Every OS records details of its operating conditions and
errors, and these time-stamped logs are the fundamental
and authoritative source of system telemetry. Depending on
the OS, there may be separate logs for different classes of
events, such as routine informational updates, system errors,
boot loader records, login attempts and debug output. Error
logs often aggregate records from multiple subsystems and
OS services or daemons, and, thus, are a definitive source of
troubleshooting information.

Use Cases
Security and Compliance: System logs include a variety of
security information such as attempted logins, file access and
system firewall activity. These entries can alert security teams
to network attacks, a security breach or compromised software.
They also are an invaluable source of information in forensic
analysis of a security incident. For example, the data can be used
to identify changes in system configurations and commands
executed by users or privileged users.

IT Ops and Application Delivery: System logs often are


the first place operations teams turn when troubleshooting
system problems, whether with the OS, hardware or various I/O
interfaces. Since a particular problem often manifests itself with
errors in multiple subsystems, correlating log entries is one of the
best ways of identifying the root cause of a subtle system failure.

The Essential Guide to Data | Splunk 31


System Performance Use Cases
Security and Compliance: While primarily used for keeping
Use Cases: Security and Compliance, IT Operations, infrastructure up and running, monitoring system performance
Application Delivery can also be used to uncover potential security incidents by
detecting abnormal activity in performance. One example
OPERATING SYSTEM DATA

OPERATING SYSTEM DATA


Examples: PERFMON, Windows Events Logs, sar, vmstat, is abnormal system resource usage in correlation with a
iostat, statsd security indication.

Measures of system activity such as CPU load, memory and IT Ops and Application Delivery: Performance logs provide
disk usage, and I/O traffic are the IT equivalent of EKGs to a a real-time indication of system health by showing resource
doctor: the vital signs that show system health. Recording these usage that, when compared with historical norms, flags
measures provides a record of system activity over time that performance problems. When measurements deviate from
shows normal, baseline levels and unusual events. By registering standard or typical parameters, it’s a warning for IT admins to
myriad system parameters, performance logs also can highlight do further investigation.
mismatches between system capacity and application
requirements, such as a database using all available system
memory and frequently swapping to disk.

The Essential Guide to Data | Splunk 32


AWS Services
Use Cases: Security and Compliance, IT Operations

Virtual
Examples: CloudTrail, CloudWatch, Config, S3

VIRTUAL INFRASTRUCTURE DATA


AWS is the largest and most widely used public cloud

Infrastructure
infrastructure, providing on-demand compute, storage,
database, big data and application services with consumption-
based pricing. AWS can be used to replace traditional enterprise

Data
virtual server infrastructure in which software runs on individual
virtual machines (VM) or to host cloud-native applications
built from a collection of AWS services. AWS includes a host
of service management, automation, security, network and
monitoring services used to deploy, scale, decommission, audit
and administer one’s AWS environment, subscriptions and
hosted applications.

Use Cases
Security and Compliance: Security data from AWS services
includes login and logout events and attempts, API calls and logs
from network and web application firewalls.

IT Ops: AWS services provide similar types of system and


service data as traditional IT infrastructure, much of which is
consolidated by the CloudWatch service. These include service
monitoring, alarms and dashboards for metrics, logs and events
generated by other AWS resources and applications. Typical
events and measures include when instances are instantiated
and decommissioned, CPU usage, network traffic and
storage consumption.

The Essential Guide to Data | Splunk 33


Google Cloud Platform Microsoft Azure
(GCP) Use Cases: Security and Compliance, IT Operations

Use Cases: Security and Compliance, IT Operations Examples: WADLogs, WADEventLogs, WADPerformanceCounter,
VIRTUAL INFRASTRUCTURE DATA

VIRTUAL INFRASTRUCTURE DATA


WADDiagnostInfrastructure
Examples: Stackdriver
Azure is a popular and widely used public cloud infrastructure,
GCP is a popular and widely used public cloud infrastructure, providing on-demand compute, storage, database, big data
providing on-demand compute, storage, database, big data and application services with consumption-based pricing.
and application services with consumption-based pricing. GCP Azure can be used to replace traditional enterprise virtual
can be used to replace traditional enterprise virtual server server infrastructure in which software runs on individual VMs,
infrastructure in which software runs on individual VMs, or or to host cloud-native applications built from a collection of
to host cloud-native applications built from a collection of Azure services. Azure includes a host of service management,
GCP services. GCP includes a host of service management, automation, security, network and monitoring services used to
automation, security, network and monitoring services used to deploy, scale, decommission, audit and administer one’s Azure
deploy scale, decommission, audit and administer one’s GCP environment, subscriptions and hosted applications.
environment, subscriptions and hosted applications.
Use Cases
Use Cases Security and Compliance: Security teams can use Azure
Security and Compliance: Security data from GCP services service logs to audit and attest to compliance with established
includes login and logout events and attempts, API calls and logs policies. Log data also is invaluable for incident forensic analysis,
from network and web application firewalls. such as identifying unauthorized access attempts from access
logs, tracking resources and configuration change events and
IT Ops: GCP services provide similar types of system and
identifying vulnerabilities in hosts or firewalls.
service data as traditional IT infrastructure, much of which is
consolidated by Stackdriver. These include service monitoring, IT Ops: Azure services provide detailed metrics and logs for
alarms and dashboards for metrics, logs and events generated monitoring one’s infrastructure across the entire technology
by other GCP resources and applications. Typical events stack, VMs, containers, storage and application services. The
and measures include when instances are instantiated data is useful in maintaining application delivery quality and
and decommissioned, CPU usage, network traffic and service levels, measuring user behavior, resource utilization and
storage consumption. for capacity planning and cost management.

The Essential Guide to Data | Splunk 34


Pivotal Cloud Foundry Use Cases
IT Ops and DevOps: Operations teams can use PCF metrics,
(PCF) much of which is consolidated via the Loggregator Firehose
VIRTUAL INFRASTRUCTURE DATA

VIRTUAL INFRASTRUCTURE DATA


to gain insights into deployment health, capacity needs and
Use Cases: IT Operations and DevOps application health before end users are impacted by degraded
performance. Since PCF allows DevOps to run their applications
Examples: Loggregator, PCF Healthwatch
on any cloud rapidly and to scale on demand, PCF data is critical
Pivotal Cloud Foundry is a platform-as-a-service (PaaS) built for teams to get the end-to-end visibility into the entire lifecycle
on top of Cloud Foundry, an open source cloud computing and visibility between each individual component. When it
platform that allows developers to easily deploy, operate and comes to operating PCF deployments at scale, understanding
scale cloud-native applications. Enterprises can manage the performance relies on dependencies among the various layers
entire application lifecycle, from packaging to deployment to within the app, container and larger architecture.
execution, as Cloud Foundry supports many cloud frameworks
and application languages. With PCF, the installation and
administration of cloud-native applications is simplified with
capabilities around infrastructure management and provisioning,
OS patching, container orchestration, security and more.

The Essential Guide to Data | Splunk 35


VMware Server Logs, • Performance Information: for each configuration item, the
vCenter server tracks a number of performance metrics about

Configuration Data and that item. Datastore latency, virtual or physical CPU utilization,
and over 100 other metrics fall into this category. As with the
Performance Metrics inventory information, this information is not present in the log
VIRTUAL INFRASTRUCTURE DATA

VIRTUAL INFRASTRUCTURE DATA


files and must be viewed through the vSphere client or polled
Use Cases: Security and Compliance, IT Operations
through the vSphere API.

Examples: vCenter, ESXi


Use Cases
Security and Compliance: The uncoupled nature of virtual
VMware vSphere ESXi is the most commonly used enterprise
resources and underlying physical hardware can cause complex
server virtualization platform. The VMware management platform,
challenges during incident investigations, capacity analyses,
whether one of the vSphere products or standalone hypervisor,
change tracking and security reporting. One common security
produce a variety of data and fall into four main categories:
use case for VMware data comes from the vCenter logs, which
• vCenter Logs: vCenter is the “control center” of a vSphere audit the activity of individuals using the vSphere interface to re-
environment. The vCenter logs show information including: assign user permissions within the VMware environment.
who is logging in to make changes, which individuals made
IT Ops: Operations teams can use VMware data to measure the
changes and authentication failures.
health of the overall hypervisor environment and underlying
• ESXi Logs: Every vSphere environment includes one or more guest operating systems. Admins can use this data for capacity
ESXi hypervisors; these are the systems that host the virtual planning, and for troubleshooting of ongoing performance issues,
machines. ESXi logs contain information that is useful when such as datastore latency issues.
troubleshooting hardware and configuration issues.
This data also records hardware resource usage that can be used
• Inventory Information: the vCenter environment tracks to optimize VM deployments across a server pool to maximize
configuration about a number of configuration items including: resource consumption without having workloads overwhelm any
hypervisors, virtual machines, datastores, clusters and more. given server.
This includes the configuration of each item, and how a given
item relates to any other. This information is not represented
in the log files from either the vCenter or ESXi servers. This
information can be viewed using the vSphere client or by
using vSphere APIs to pull this information. In both cases this
information is pulled from the vCenter servers.

The Essential Guide to Data | Splunk 36


Backup
Use Case: IT Operations

PHYSICAL INFRASTRUCTURE DATA


Physical
Despite the use of data replication to mirror systems, databases
and file stores, data backup remains an essential IT function by
providing for long-term, archival storage of valuable information,

Infrastructure
much of which has legal and regulatory requirements regarding
its preservation. Backups also can be used to store multiple
versions of system images and data, allowing organizations to

Data reverse changes, accidental deletions or corrupted data quickly,


restoring the system or database to a known good state. Backup
software can use different types of storage media depending on
the likelihood of needing the data: external disks or virtual tape
libraries for active data and tape, optical disks or a cloud service
for long-term storage.

Use Cases
IT Ops: Backup systems routinely log activity and system
conditions, recording information such as job history, error
conditions, backup target and a detailed manifest of copied files
or volumes. This data allows operations teams to monitor the
health of backup systems, software and jobs; triggers alerts in
the case of errors; and assists in debugging backup failures. It
also allows teams to locate where specific data may be stored,
when a recovery is required.

The Essential Guide to Data | Splunk 37


Environmental Sensors Industrial Control
Use Cases: Internet of Things, Business Analytics Systems (ICS)
Examples: Bosch Sensortec, Mouser Electronics, Raritan, Use Cases: Security and Compliance, Internet of Things,
PHYSICAL INFRASTRUCTURE DATA

Schneider Electric, TSI, Vaisala Business Analytics

PHYSICAL INFRASTRUCTURE DATA


Environmental sensors provide data on barometric air pressure, Examples: ABB, Emerson Electric, GE, Hitachi, Honeywell,
humidity, ambient air temperature and air quality. They are Rockwell Automation, Siemens, Toshiba
applied in everything from combating pollution and detecting
gasses to keeping data centers from overheating. Within the context of a manufacturing environment, industrial
control systems make use of programmable logic controllers to
Use Cases both acquire data and execute supervisory functions. Much of
Internet of Things: Environmental sensors are a class of smart the process automation employed in a manufacturing facility is
meters that have been optimized to monitor the environment. In enabled by the industrial control systems.
some instances, such as a data center, the information provided
by these sensors is used to automatically alter temperature Use Cases
setting and heat flow. Security and Compliance: Industrial control systems play a critical
role in delivering services to industry and municipalities across
Business Analytics: Environmental sensor data collect can the world. These systems live on top of traditional IT infrastructure
be used in retail applications capable of answering predictive and — while typically separate from enterprise IT — digital
questions, such as “what impact inclement weather might have transformation is driving organizations to provide connectivity to
on foot traffic in a mall?” these systems, increasing exposure to attacks. These systems tend
to be unmanned from a security perspective. Regardless of how ICS
might get attacked or infected, data from ICS devices can provide
visibility and can be used to analyze and identify malicious activity
and potential threats. This visibility enables companies to measure
impact and risk, and associate them with business processes.

Internet of Things: Machine data from ICS can be used to gain


real-time visibility into the uptime and availability of critical assets.
This enables companies to detect an issue, perform root cause
analysis and take preventive action to prevent certain events from
happening in the future. Companies are also leveraging machine
data from ICS systems to secure these mission-critical assets.

Business Analytics: Organizations can apply machine learning


algorithms against the machine data created by industrial control
systems to increase productivity, uptime and availability. ICS data
can also drive visibility into complex manufacturing processes,
helping identify bottlenecks and remove inefficiencies.
The Essential Guide to Data | Splunk 38
Mainframe Medical Devices
Use Cases: IT Operations Use Cases: Internet of Things, Business Analytics
PHYSICAL INFRASTRUCTURE DATA

Mainframes are the original business computer: large, centralized Examples: Abbott Laboratories, Apple, Baxter, Boston Scientific,

PHYSICAL INFRASTRUCTURE DATA


systems housing multiple processors, system memory (RAM) GE, Siemens, St. Jude Medical
and I/O controllers. Despite their 60-year legacy, mainframes
still are widely used for mission-critical applications, particularly Everything from intensive care units to wearable devices
transaction processing. Although they usually run a proprietary generates multiple types of machine data. In fact, just about
OS, mainframes also can be virtualized to run Unix and Linux or, every aspect of patient care inside and out of a hospital setting
with add-on processor cards, Windows Server. Mainframes are can be instrumented. While the primary goal is to save lives, a
valued for their bulletproof reliability and security, using highly crucial secondary goal is to reduce healthcare costs by reducing
redundant hardware and resilient, stringently tested software. both the number of potential visits to a hospital as well as the
As such, they appeal to organizations wanting to consolidate length of stay.
workloads onto a small number of systems and that need the
added reliability and versatility.
Use Cases
Internet of Things: Most devices inside a hospital are connected
Use Cases to local monitoring applications. But it’s possible to monitor
IT Ops: Like other servers, mainframes measure and log patient care remotely using sensors that communicate with
numerous system parameters that show their current status, either a wearable device or some other system for monitoring
configuration and overall health. Since most mainframe patients in their homes.
subsystems are redundant, system logs also show non-disruptive
Business Analytics: Machine data also makes it simpler for
hardware failures or anomalous behavior that is predictive of
medical professionals to analyze both patient and anonymous
an impending failure. Due to their use for critical applications,
data across a broader range of geographically distributed
mainframes often record application performance data such
regions — for example, to see how certain diseases are affecting
as memory usage, I/O and transaction throughput, processor
a group of people more than another.
utilization and network activity.

The Essential Guide to Data | Splunk 39


Metric Line Protocols Examples of Metric Line Protocols
collectd: Collectd is a protocol that involves an agent running
Use Cases: IT Operations, Application Delivery, Internet of Things on a server that is configured to measure specific attributes and
transmit that information to a defined destination. Collectd is
Examples: collectd, statsd an extensible measurement engine, so you can collect a wide
PHYSICAL INFRASTRUCTURE DATA

PHYSICAL INFRASTRUCTURE DATA


range of data. Currently, collectd is most often used for core
Metrics are measurements generated by a process running
infrastructure monitoring insights, such as getting insight on the
on a system that provide a regular data point around a given
workload, memory usage, I/O, and storage of servers and other
metric, such as CPU utilization. Metrics data sources generate
infrastructure components. Collectd is part of the open source
measurements on regular intervals and generally consist of:
community, and you can learn much more about collectd by
• Timestamp visiting http://collectd.org.

• Metric Name statsd: is a network daemon that runs on node.js. It has


• Measurement (a data point) gained popularity with windows administrators, application
performance experts and others. Statsd provides some
• Dimensions (that often describe the host, kind of instance, or
capabilities that allow for metrics to be delivered in batch, and
other attributes that you might want to filter or sort metrics on)
while it uses the less dependable UDP network method, many
Metrics are typically generated by a daemon (or process) administrators like how easy it is to deploy. Much like collectd,
that runs on a server (OS), container, application. Each data statsd is focused on collecting metrics, mostly involving
measurement is delivered by a network protocol, such as UDP or the usage and performance of applications and application
HTTP, to a server that indexes and analyzes that information. components, and sending them via the network to a tool that can
collect and analyze that information.
Metrics are particularly useful for monitoring. For example, a
heart monitor that regularly checks a patient’s pulse, metrics Use Cases
provide insight into trends or problems that affect the IT Ops and Application Delivery: Metrics Line Protocols
performance and availability of infrastructure and application. provides usage, performance and availability data across
However, a heart monitor won’t tell you why a patient has a operating systems, storage devices, applications and other
sudden issue with their heart rate - you need other means to components of IT infrastructure. Metrics are particularly useful
quickly identify the cause and stabilize the patient. It’s the same for the monitoring portion of IT Operations and Application
with machine data. When combined with other data sources, Delivery, where trends can help identify where there is a problem.
usually logs, you gain insight into both what’s going on, and why Once trends and thresholds illustrate performance issues, other
it’s happening. data sources are often correlated to determine the root cause
of the problem.

Internet of Things: As devices become more intelligent, more


metrics based telemetry will be on board. Metrics line protocols
represent an efficient way for these devices to report their status
and performance.

The Essential Guide to Data | Splunk 40


Patch Logs Physical Card Readers
Use Cases: Security and Compliance, IT Operations Use Case: Security and Compliance

PHYSICAL INFRASTRUCTURE DATA


Keeping operating systems and applications updated with the Most organizations use automated systems to secure physical
PHYSICAL INFRASTRUCTURE DATA

latest bug fixes and security patches is an essential task that can access to facilities. Historically, these have been simple magnetic
prevent unplanned downtime, random application crashes and strips affixed to employee badges; however, locations with
security breaches. Although commercial apps and operating stringent security requirements may use some form of biometric
systems often have embedded patching software, some reader or digital key. Regardless of the technology, the systems
organizations use independent patch management software compare an individual’s identity with a database and activate
to consolidate patch management and ensure the consistent doors when the user is authorized to enter a particular location.
application of patches across their software fleet and to build As digital systems, badge readers record information such as
patch jobs for custom, internal applications. user ID, date and time of entry and perhaps a photo for each
access attempt.
Patch management software keeps a patch inventory using a
database of available updates and can match these against an Use Cases
organization’s installed software. Other features include patch Security and Compliance: For IT security teams, the data from
scheduling, post-install testing and validation and documentation card readers provide the same sort of access information for
of required system configurations and patching procedures. physical locations as a network firewall log. The data can be used
to detect attempted breaches and be correlated to system and
Use Cases network logs to identify potential insider threats and provide
Security and Compliance: Security teams can use patch logs to overall situational awareness. It can also be used to detect
monitor system updates and determine which assets could be at access at unusual times and locations or for unusual durations.
risk, due to failed or out-of-date patches.

IT Ops: Operations teams use patch logs to verify the timely and
correct application of scheduled patches, identify unpatched
systems and applications, and alert to errors in the patching
process. Correlating errors to patch logs can indicate when an
error is due to a patch.

The Essential Guide to Data | Splunk 41


Point-of-Sale Systems Use Cases
Security and Compliance: POS systems are typically used for
(POS) financial transactions and are often targeted since they contain
account, payment and financial information. Because the POS
Use Cases: Security and Compliance, Internet of Things, transaction information is highly sought after for its value to
Business Analytics attackers, and the POS can be used as an entry point to the
network, it’s critical to protect these systems. Furthermore, POS

PHYSICAL INFRASTRUCTURE DATA


Examples: IBM, LightSpeed, NCR, Revel Systems, Square,
PHYSICAL INFRASTRUCTURE DATA

systems are usually unmanned, run an underlying operating


Toshiba, Vend
system, and versioning/monitoring typically fall outside of
Point-of-sale systems are most often associated with transactions IT’s purview — adding additional complexity to their security.
generated at a retail outlet. However, thanks to the rise of Visibility and analysis of POS systems and data can provide
mobile POS solutions, many of these systems are starting to be insights that are critical to protecting financial information,
deployed in temporary locations, such as a community fair or a detecting fraud and securing vulnerabilities.
high school event.
Internet of Things: Historically, POS systems were either not
The typical POS system incorporates a cash register based connected or managed on a dedicated private network. Thanks
on a PC or embedded system, monitor, receipt printer, display, to the rise of the IoT, these systems are being connected
barcode scanner, and debit/credit card reader. Machine data directly to cloud platforms that make remotely administering
generated by POS systems provides organizations with real-time these devices from a central location much simpler. There’s no
insight into everything from what’s sold, to the amount of cash longer a need to dispatch IT personnel to manually update each
being generated per transaction, to what payment methods are system. This is critical because a POS failure can result in longer
being used. lines that inconvenience customers and potentially lead to lost
revenue. A negative customer experience can easily translate to
customers opting to shop somewhere else in a retail industry that
is intensely competitive.

Business Analytics: POS systems contain information about


what’s sold, how it’s paid for, as well as the pace at which it’s
being sold. Organizations can use this data to monitor revenue
in real time, which can feed into how to better market 1:1 against
customers, track product placement and sales in a store, or
detect potentially fraudulent transactions in real time. This
type of real time Big Data analysis can have a profound impact
on customers cross- and up-sell opportunities. POS data also
delivers visibility into customer experience such as which
coupons are most popular or the combinations of products that
are selling together. When enriched with geolocation data, it can
also drive valuable insights into location-based analytics.

The Essential Guide to Data | Splunk 42


RFID/NFC/BLE Use Cases
Internet of Things: RFID is arguably one of the first instances
Use Cases: Internet of Things, Business Analytics of an IoT application. Deployed in place of traditional barcode
readers, RFID tags are used in everything from shipping to
Examples: Alien Technology, BluVision, CheckPoint Systems, keeping track of farm animals. IoT deployments make it possible

PHYSICAL INFRASTRUCTURE DATA


PHYSICAL INFRASTRUCTURE DATA

Gimbal, MonsoonRF, Radius Networks, STMicroelectronics, to capture RFID data in a way that makes it simpler to track
TAGSYS RFID, ThingMagic events involving anything that has an attached RFID tag. Data
insights from RFID can help improve overall supply chain, order
The two primary wireless methods organizations use today
processing and inventory management.
to keep track of objects and interact with customers in retail
stores involve two distinct types of wireless communications BLE, meanwhile, is used to engage customers more directly as
technologies. The better known is radio-frequency optimization they move about a specific location, which in turn creates data
(RFID), which involves the use of tags capable of storing that can be used to optimize the customer experience.
information such as product information or what goods might be
loaded in a shipping container. Business Analytics: Whether it’s inventory tracked using RFID
tags or customers and employees moving around specific
At the same time, organizations are adopting Bluetooth Low locations, new classes of analytics applications are using the
Energy (BLE) wireless connectivity solutions that can broadcast data generated by these devices to serve up actionable business
signals to other devices. BLE is used most widely in beacons that insights in near real time. Retailers can leverage this data for
are employed, for example, to inform shoppers of new sales in several use cases, such as making sure that inventory is located
retail stores on their smartphones or update fans on events that as close as possible to the locations where customers are most
might be occurring during a sporting event. likely to want to purchase.

The Essential Guide to Data | Splunk 43


Sensor Data Additional Use Cases
Preventative Maintenance and Asset Lifecycle Management:
Use Cases: Security and Compliance, IT Operations, Sensor data can provide insights into asset deployment,
Internet of Things, Business Analytics utilization and resource consumption. Operational data can also
be used to proactively approach long-term asset management,
Examples: Binary and numeric values including switch state, maintenance and performance.
temperature, pressure, frequency, flow, from MQTT, AMQP and
CoAP brokers, HTTP event collector Monitoring and Diagnostics: Monitoring sensors can help
ensure that equipment in the field operates as intended, for
PHYSICAL INFRASTRUCTURE DATA

PHYSICAL INFRASTRUCTURE DATA


Industrial equipment, sensors and other devices often have example, monitoring and tracking unplanned device or system
embedded processors and networking that allows them downtime. The data can also be used to understand the cause of
to record and transmit a vast array of information about failure on a device to improve efficiency and availability, and to
operating conditions. Regardless of device, their data provides identify outliers and issues in device production or deployment.
unprecedented detail about performance parameters and
anomalies that can indicate larger problems — for example, a
device ready to fail or issues with another system. Aggregating
and correlating data from multiple devices and subsystems
provides a complete picture of equipment, system, factory or
building performance.

Use Cases
Security and Compliance: Sensor data can help protect
mission-critical assets and industrial systems against
cybersecurity threats by providing visibility into system
performance or set points that could put machines or
people at risk. Data can also be used to satisfy compliance
reporting requirements.

IT Ops: Some of the most important parameters for operations


teams to monitor are environmental conditions such as
temperature, humidity, airflow and voltage regulation in a data
center. Similar readings are available from individual servers
and network equipment that, when correlated, can highlight
problems in the facility or equipment ready to fail.

The Essential Guide to Data | Splunk 44


Server Logs Smart Meters
Use Cases: Security and Compliance, IT Operations, Use Cases: Internet of Things, Business Analytics
Application Delivery
Examples: ABB, GE, Google, eMeter, IBM, Itron,

PHYSICAL INFRASTRUCTURE DATA


PHYSICAL INFRASTRUCTURE DATA

Server operating systems routinely record a variety of Schneider Electric, Siemens


operational, security, error and debugging data such as
system libraries loaded during boot, application processes Smart meters record consumption of energy, usage of water, or
open, network connections, file systems mounted and system usage of natural gas so that the information can be continually
memory usage. The level of detail is configurable by the system processed and shared. Typically, smart meters allow for bi-
administrator; however, there are sufficient options to provide directional communication in real time in a way that allows a
a complete picture of system activity throughout its lifetime. gauge of some type to be adjusted.
Depending on the subsystem, server logs are useful to system,
network, storage and security teams.
Use Cases
Internet of Things: Smart meters are deployed across critical
Use Cases systems at large utilities companies, for example, power, gas and
Security and Compliance: Server logs include data from water utilities. These systems are the lifeblood of infrastructure
security subsystems such as the local firewall, login attempts and failure can lead to catastrophic outcomes. Real time
and file access errors that security teams can use to identify monitoring of smart meters can help organizations better analyze
breach attempts, track successful system penetrations failures remotely, by way of detecting remotely line down failures.
and plug vulnerabilities. Monitoring server logs such as file Equally important is securing the devices from tampering that
access, authentication and application usage can help secure could lead to malicious attacks and breaches.
infrastructure components.
Energy companies and water utilities make extensive use of
smart sensors to track everything from oil reserves to the quality
IT Ops and Application Delivery: Server logs provide a detailed
of the water supply.
record of overall system health, and forensic information about
the exact time of errors and anomalous conditions that are Business Analytics: A wide variety of industries are applying
invaluable in finding the root cause of system problems. analytics to the data being collected by smart meters to optimize
service. For example, an oil or gas company no longer needs
to physically send a worker to a location to read a meter. The
provider already knows how much fuel has been consumed and
how much remains.

Smart meters in the future will be used in everything from


modern traffic control systems to defense systems designed to
protect critical infrastructure. Aggregating data from these smart
meters can give utilities critical insights into the demand. Heavily
regulated utilities are required to meet established SLA’s during
demand response events, and machine data from smart meters
can drive visibility into how they are responding.
The Essential Guide to Data | Splunk 45
Storage Telephony
Use Case: IT Operations Use Cases: IT Operations
PHYSICAL INFRASTRUCTURE DATA

PHYSICAL INFRASTRUCTURE DATA


Examples: EMC, Netapp, IBM, Amazon EBS Examples: Cisco Unified Communications Manager,
Shoretel, Twilio
Data center storage is provisioned in two general ways: built into
servers and shared using various network storage protocols, or Real-time business communications are no longer limited to
via a dedicated storage array that consolidates capacity for use voice calls provided by plain old telephone service (POTS);
by multiple applications that access it using either a dedicated instead, voice, video, text messaging and web conferences are IP
storage area network (SAN) or ethernet LAN file-sharing applications delivered over existing enterprise networks. Unlike
protocol. The activity of internal, server-based storage is typically traditional client-server or web applications, telephony and
recorded in system logs, however storage arrays have internal other communications applications have strict requirements
controllers/storage processors that run a storage-optimized on network quality of service, latency and packet loss, making
OS and log a plethora of operating, error and usage data. Since service quality and reliability much more sensitive to network
many organizations have several such arrays, the logs often are conditions and server responsiveness. Traditional POTS has
consolidated by a storage management system that can report conditioned people to expect immediate dial tone when
on the aggregate activity and capacity. picking up the phone and be intolerant of noise, echo or other
problems that can plague IP telephony; as such, the systems
Use Cases and supporting infrastructure require careful monitoring and
IT Ops: Shared storage logs record overall system health (both management to assure quality and reliability.
hardware and software), error conditions (such as a failed
controller, network interface or disks) and usage (both capacity Use Cases
used per volume and file or volume accesses). Collectively, the IT Ops: Like VoIP, telephony logs provide an overview of system
information can alert operations teams to problems, the need for health along with troubleshooting and usage data similar to that
more capacity and performance bottlenecks. of other network applications. Details include source, destination,
time and duration of voice/video calls, web conferences and
text messages, call-quality metrics (e.g., packet loss, latency,
audio fidelity/bit rate), error conditions and user attendance at
web conferences. By integrating telephony records of source/
destination address with an employee database such as AD or
LDAP and a DHCP database, organizations can link call records
to actual user IDs and IP addresses to physical locations;
information that can assist in troubleshooting and billing. Logs
also can reveal any network segments experiencing congestion
or other performance problems that may indicate equipment
problems or the need for an upgrade.

The Essential Guide to Data | Splunk 46


Transportation Wearables
Use Cases: Internet of Things, Business Analytics Use Cases: Internet of Things, Business Analytics

Examples: Boeing, BMW, Ford, GE, General Motors, Daimler-Benz, Examples: ARM, Intel, Lenovo, Microsoft, Samsung

PHYSICAL INFRASTRUCTURE DATA


PHYSICAL INFRASTRUCTURE DATA

John Deere, Volkswagen


From smartwatches that double as fitness aids to medical
Vehicles of all sizes and types generate massive amounts of devices that enable physicians to remotely monitor vital
machine data every day that can be used to gain real-time statistics, wearable devices have proven they are here to stay.
visibility into the health and performance of an asset, and to drive Wearable devices are one of the most recognizable parts of the
predictive maintenance applications. Armed with that data, an Internet of Things.
airplane or automobile manufacturer can follow a maintenance
regime that is data driven rather than driven “by the book.” Use Cases
Internet of Things: Beyond merely syncing with smartphones,
That information can then be used to improve availability and the latest generation of smartwatches is taking advantage of
reliability, and extend the life cycle of a vehicle that has not been geo-positioning systems and application programming interfaces
extensively used or, conversely, replace components that have to give device owners an optimal application experience that
seen extensive wear and tear sooner. includes both their location and often time of day.

Use Cases Going forward, there soon will be whole new classes of wearable
Internet of Things: Vehicle manufacturers are attaching devices taking advantage of everything from virtual reality
sensors to every mechanical and electronic component they applications delivered via a headset to sensors embedded in the
use. This allows companies to gain a unified view of assets latest fashion.
to quickly identify and diagnose operational issues, and to
monitor, track and avoid unplanned asset downtime. This helps Business Analytics: As more people become comfortable
to ensure that equipment is operating as intended. They can with sharing data via wearable devices, many are experiencing
also detect anomalies and deviations from normal behavior the power of analytics firsthand. Developers of applications
to take corrective action — improving uptime, asset reliability optimized for wearables are making recommendations
and longevity. concerning everything from how to improve life expectancy
to where to find a meal. Analytics from wearables can help
Business Analytics: With access to machine data, vehicle improve user experience and drive product innovation. For
manufacturers are applying analytics in ways that fundamentally example, product managers can understand how consumers are
changes their business models. Instead of selling a vehicle, interacting with devices to build better features.
manufacturers increasingly prefer to lease vehicles based on
actual usage. The longer that vehicle can be used between
repairs, the more profitable that leasing service becomes.
The key to providing this type of service economically is
advanced analytics, which are applied to all the aggregate data
that’s collected.

The Essential Guide to Data | Splunk 47


Database
Use Cases: Security and Compliance, IT Operations,
Application Delivery

Additional Data

ADDITIONAL DATA SOURCES


Examples: MySQL, Postgres, Other Relational Databases

Sources
Databases are the fundamental elements of information
collection, storage and analysis of digital information. Databases
are categorized as either relational, in which data is organized in
spreadsheet-like tables of columns and rows, or NoSQL (non-
relational), where information is organized purely by columns
(column store) as key-value pairs, by unstructured documents or
interconnected graphs linking related data elements.

Use Cases
Security and Compliance: Database logs provide security
teams information about the accounts or systems accessing
tables or other database elements. Correlating database access
and transaction logs with identity management system records
can flag unauthorized access or access attempts to databases.
Database logs can also expose security holes such as open ports
or dormant, unused admin accounts, and help identify abnormal
queries or users, and abnormal database/table access.

IT Ops and Application Delivery: Database logs can be


aggregated and analyzed to show the overall performance of
a particular database system, and also provide visibility into
database issues. Metrics useful to IT operations teams include
queries per second and query response time, both measured
against a baseline standard made from historical data.

The Essential Guide to Data | Splunk 48


Business Service Use Cases
Security and Compliance: Hackers are good at covering their
Transaction and tracks by altering common log files, but business process

Business Service
logs that track activity across multiple systems used in a
particular process can highlight anomalies that may indicate
ADDITIONAL DATA SOURCES

Performance Data

ADDITIONAL DATA SOURCES


security issues.

IT Ops and Application Delivery: IT can use process logs to


Use Case: Security and Compliance, IT Operations,
identify flaws in their support or admin processes, or problems
Application Delivery
that have fallen through gaps in existing process flows. IT can
Examples: Payments Status, Batch Upload Status, Customer use business performance metrics for understanding system
Order Status, Requests per Customer, SLA Tracking, Business KPI baselines and comparing performance to SLIs and SLOs to
Tracking, SLO and SLI insights ensure customer requirements are met. These baselines are
great for capacity planning and enabling faster release cycles
Transaction records provide an auditable trail of activity for thanks to increased predictability. Monitoring custom business
every part of every business process. Whether for financial metrics can also provide real-time insights, especially during
transactions such as payments and orders, or tasks such as high-volume service spikes, so teams can mitigate customer
customer support and service calls, business process logs experience issues before the business is affected.
are required to verify activity in case of disputes, to certify
compliance with regulations and terms of service, and to provide
detailed evidence of business transactions. A technique called
business process mining uses sophisticated software to analyze
logs and identify process, control, data, organizational and social
structures. These might include mapping the flow of patients
through a hospital or customer problems through a support
organization to optimize process flow, measure performance and
identify outlier incidents for further investigation. Tracking real-
time business performance metrics like response times or even
shoes sold per minute during a new product launch, allows teams
to provide a consistent customer experience and benchmark for
future releases.

The Essential Guide to Data | Splunk 49


Human Resources Social Media Feeds
Use Cases: Security and Compliance Use Cases: IT Operations
ADDITIONAL DATA SOURCES

ADDITIONAL DATA SOURCES


Examples: BambooHR, Fairsail HRMS, Namely, Zenefits Social networks are some of the most heavily trafficked sites
on the internet. By allowing users to communicate and share
Human Resources records include information relating to the information among friends and colleagues, social media has
entire employee life cycle. HR records provide the definitive become an important outlet for news, entertainment, photo-
source of employee information for identity management sharing and real-time reaction to public events. As such, social
systems and enterprise directories, making them an important media feeds are an increasingly effective advertising medium
source for authentication and authorization data. Although and source of customer contact, feedback and support.
HR data traditionally has been textual, it increasingly includes
images and biometric information such as an employee’s portrait, Use Cases
fingerprints and iris scans. IT Ops: Due to their interactivity, convenience and ubiquity,
social media feeds provide organizations with an unfiltered and
Use Cases instantaneous view of customer opinion. By analyzing feeds from
Security and Compliance: HR records can show if someone no the most popular sites, organizations can quickly identify potential
longer employed still has active accounts, and can also provide problems with a product or service, mishandled customer support
evidence of disciplinary action that might be useful in security incidents or other sources of customer dissatisfaction about an
investigations. organization’s products or online presence. Proactively addressing
these online complaints allows the organization to turn unhappy
and potentially lost customers into delighted and loyal ones.

The Essential Guide to Data | Splunk 50


Third-Party Lists
Use Case: Security and Compliance
ADDITIONAL DATA SOURCES

Examples: Threat Lists, OS Blacklist, IP Blacklists, Vulnerability


Lists, Google Analytics

One of the methods that IT security vendors use to detect and


flag security problems is one or more databases of known threats
and vulnerabilities. These include malware code signatures, OS
and application patch versions, the source IP address of previous
attacks and spam and reputation databases using real-time
aggregation of malware, spam and compromised websites
collected from millions of users. Third-party lists provide an early-
warning system for new methods or sources of attack.

Use Cases
Security and Compliance: By aggregating data from users
around the world, third-party security lists provide security
teams with real-time information about nascent threats and
vulnerabilities that allow updating security policies, firewall rules
and vulnerable software before an attack. Lists also are used to
identify known sources of spam, both commercial and malware-
infested, to improve the effectiveness of filters on internal
email systems.

The Essential Guide to Data | Splunk 51


About Splunk.
Splunk Inc. turns data into doing. Splunk technology is designed
to investigate, monitor, analyze and act on data at any scale.
Join millions of passionate users by trying Splunk for free.

Free Trial

Splunk, Splunk> and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and
other countries. All other brand names, product names or trademarks belong to their respective owners. © 2022 Splunk Inc.
All rights reserved.

22-13476-Splunk-Essential-Guide-to-Data-EB-111

You might also like