Centralized Logging Guide: by David Bitton
Centralized Logging Guide: by David Bitton
BY DAV I D B I T TO N
3 SUMMARY
<PART 1>
<PART 2>
20 COLLECTING
20 PARSING
20 QUERYING
21 MONITORING
22 SETTING UP CLOUDWATCH ALARMS
23 EXPORTING
<PART 3>
25 MANAGING AWS LOGS
<PART 4>
32 KEY TAKEAWAYS
T he key challenge with modern visibility on clouds like AWS is that data
originates from various sources across every layer of the application
stack, is varied in format, frequency, and importance and all of it needs to be
monitored in real-time by the appropriate roles in an organization. An AWS
centralized logging solution, therefore, becomes essential when scaling a
product and organization.
AWS has by far, the most comprehensive suite of cloud services, numbering
175 services as of 2020. Every AWS service churns out its own set of
metrics, events, and logs. Additionally, there are performance metrics
produced by the applications running in AWS. AWS provides CloudWatch
for centralizing this data. Being a native-AWS service, there is hardly
any setup required and CloudWatch automatically records some default
monitoring data from many AWS services as soon as they are activated.
//3
<PART 1>
AWS
LOGGING
BASICS
<PART 1>
//5
<PART 1>
AWS
CloudWatch
RESOURCES
Custom Event
CloudWatch Available
Alarm Statistics
AWS
SNS Auto
Management
Notification Scaling
Console
//6
<PART 1>
AWS
CloudTrail API CALLS
INFRASTRUCTURE
Miscellaneous
Microservices Databases Resources
DATA ACCUMULATION
SNS Auto
Notification Scaling
ANALYTICS/ACTIONS
//7
<PART 1>
logs
//8
<PART 1>
metrics
events
EC2 Logs
AWS EC2 is the most popular and widely-used AWS service. It offers
cloud-based compute instances to run applications on. The instances
come in Linux and Windows flavors, and of various compute capacities.
To start collecting logs from EC2 you need to configure the appropriate
IAM policies and roles. Then, you’ll need to install the CloudWatch agent
//9
<PART 1>
using a single-line command from the AWS CLI. Once you configure
the agent, logs start streaming from the EC2 instances and are sent to
CloudWatch for analysis.
EC2
LOGS
AWS 3
EC2 Instance
SNS
Notification
AWS Auto
CloudWatch Scaling
//10
<PART 1>
VPC
Flow Logs
Availability
ZONE 1
Availability AWS 3
ZONE N
SNS
Notification
AWS Auto
CloudWatch Scaling
//11
<PART 1>
VPC Flow Logs contain information about the traffic passing through
your application at any given time. It lists the requests that were allowed
or denied according to your ACL (access control list) rules. It also has
information about the IP addresses, and ports for each request, the
number of packets and bytes sent, and timestamps for each request.
Lambda Logs
Lambda is the serverless computing solution from AWS that lets you
run applications without having to create or maintain any underlying
instances. It excels for short-term jobs that require compute capacity in
short bursts.
The CloudWatch dashboard provides you with vital logs and metrics such
as the number of invocations of a function, duration of an invocation,
errors, and throttles. This is great for gaining an overview of your
application’s health.
//12
<PART 1>
To drill deeper into the performance of your Lambda functions, you’ll need
to insert logging statements within the code of each function. Remember
to assign the appropriate execution role so that the Lambda function has
permission to publish logs to CloudWatch.
Lambda
Logs
MICROSERVICES
AWS
Lambda
LAMBDA METRICS
CloudWatch
ANALYTICS/ACTIONS
SNS Auto
AWS 3
Notification Scaling
//13
<PART 1>
S3 Logs
AWS S3 is the first service that AWS started with and it plays a vital role
is storing data, including logs, from various other AWS services. You need
visibility into S3 performance itself, but arguably the most important type
of S3 logs are the server access logs.
The logs provide visibility into each call made to an S3 bucket from other
AWS services or applications. It includes details like the source of the
request, name of the S3 bucket, request time, error, and response codes.
By default, server access logs are disabled and need to be enabled.
Going deeper, object-level logs that are tracked by CloudTrail monitor API
calls to S3 and the changes they make to the actual objects stored within S3.
As with the server access logs, object-level logs need to be manually setup.
Create access points Configure You can create Access Points are
for each application permissivos per Access Points that easy to scale as
and/or user that Access Point to limit all S3 storage you build more
requires access to lilmid public access, access to a Virtual applications for
objects in your new and restrict access Private Cloud (VPC) your large shared
or existing bucket by object prefixes, data sets.
and object tags.
//14
<PART 1>
ELB Logs
AWS Elastic Load Balancer (ELB), as the name suggests, is a load balancer
service that routes traffic across various AWS services. It is used to handle
spikes in traffic.
AWS
ELB Logs ELASTIC LOAD BALANCERS
AWS
CloudWatch
Lambda
ANALYTICS/ACTIONS
SNS Auto
AWS 3
Notification Scaling
//15
<PART 1>
RDS Logs
AWS Relational Database Service (RDS) is a managed database service
that makes it easy to scale and operate relational databases. You can run
a variety of database engines on RDS such as MariaDB, Microsoft SQL
Server, MySQL, Oracle database, and PostgreSQL database.
You can view database logs from within the RDS console or publish logs
to CloudWatch for further analysis. By default, error logs are generated
in RDS, but you’ll need to configure additional logs like slow query, audit,
and general logs for optimization and troubleshooting.
AWS
RDS Logs
AMAZON DATABASE SERVICES
Amazon Relational
Amazon Amazon Amazon
Database Servicer
Aurora ElasticCache Document DB
(RDS)
AWS
CloudWatch
Lambda
ANALYTICS/ACTIONS
SNS Auto
AWS 3
Notification Scaling
//16
<PART 1>
SNS Logs
AWS SNS is a pub/sub messaging service to send out messages to other
AWS services or even end-users via SMS or email. These messages can
be used by services like Lambda as triggers to begin parallel execution of
a job. SNS makes it easier to manage communication internally between
distributed microservice applications.
When monitoring SNS topics, you will want to monitor the volume of
messages, failed notifications, the reasons for their failures, messages
that are filtered out, and the volume of SMS’s sent. All of these metrics are
conveniently available within CloudWatch. Additionally, CloudTrail stores
information about the API calls made to SNS. In CloudTrail, you’ll need to
configure a trail to track all SNS events from all regions in an S3 bucket.
Publisher SUBSCRIBERS
AWS
Message Filtering Lambda
& Fanout
Amazon
SQS
SNS Topic
Amazon
SNS
HTTP/S
Dead-letter
Queue
//17
<PART 2>
AWS
LOGGING
WORKFLOW
<PART 2>
Default
AWS
Metric Dashboards AWS
Elastic
Lambda
Search
Custom
Dashboards
Default Analytical
Dashboards
Logstash
Custom Analytical
Dashboards
Advanced Search
ElasticSearch Default
Search Capabilities
Automatic
Log Clustering
Filtering
//19
<PART 2>
Collecting
As with any logging practice, the first step to AWS logging is to collect
logs from various sources. To start collecting logs AWS has a unified
logging agent that collects both logs and advanced metrics. There are
many ways to install the agent on your EC2 instances and other AWS
services depending on where your instances are running. You’ll also need
to configure the awslogs.conf file that specifies the log group, log stream,
time zone, and more. Some AWS services can send logs directly to S3,
but CloudWatch “Deliver Logs” costs would still apply.
Parsing
Parsing unstructured logs is critical in order to extract the full potential
value of the data and make it ready for analysis. Parsing enables us to
get statistics on log message parameter values, conduct faceted searches
and filter logs by specific fields and values.
Querying
Querying is likely the most common operational task performed on log
data. The right searching capabilities enable you to analyze logs to find
insights easier.
//20
<PART 2>
cloudwatch
aws logs
aws elasticsearch
coralogix
Monitoring
Dashboards help you track the most important metrics so you’re
always aware of the state of the system. AWS CloudWatch comes with
multiple visualization options that you can make use of. You can create
dashboards from where you can monitor metrics that are derived from
your logs. Simply create a log query in CloudWatch and add it to a
dashboard. For example, you can calculate statistics like percentiles and
aggregations. You can then visualize the data in the form of a line chart,
//21
<PART 2>
a stacked chart, or a numerical metric. Taking things further, you can add
alarms to widgets for quick and simple monitoring.
Apart from CloudWatch, within AWS you setup Lambdas to trigger alerts
as well. Using scheduled event triggers in AWS Lambda, you can run a
query and then publish the results to an Amazon Simple Notification Service
(SNS) topic which can then trigger an email or initiate an automated action.
Users that need more advanced monitoring and alerting capabilities will
need to integrate a tool like Coralogix. Examples include filtering by log
metadata, grouping by particular fields, limiting triggering to specific
times, customizing alert messaging and automating alerting with ML-
assisted anomaly detection. The Coralogix integration with CloudWatch
allows AWS customers to aggregate all of their log data combined with
data from other sources across hybrid and multi-cloud environments.
//22
<PART 2>
Exporting
Users may need to export logs from CloudWatch for archiving, sharing, or
to analyze the data further with advanced 3rd party tools. AWS provides
several different ways of getting your log data to the right source.
➜ Using Bash scripts from AWS CLI, you can export up to 10,000
logs per request while specifying log streams and groups
//23
<PART 3>
MANAGING
AWS LOGS
<PART 3>
AWS
Security GROUP
GET AUTHENTICATION
User User
1 N
GET AUTHORIZATION
GET AUTHENTICATION
Role
User
(Grants Permission)
//25
<PART 3>
While AWS has done its part, according to the ‘shared responsibility’
model, you’ll need to do your part in securing log data in AWS. For
example, in AWS S3, you can enable “MFA delete” (Multi-factor
authentication delete) to protect from accidental deletions or sabotage of
log data. Using Cloudwatch Events, you can automatically detect when
an instance is being shut down and offload log data before the shutdown
is complete.
Log Retention
Log retention is critical for operational and compliance purposes.
Sometimes a data breach could be discovered years after it actually
occurred. In cases where historical data is necessary, you need to have
your logs retained and AWS CloudWatch lets you retain log data for as
long as you like.
The costs for logging in AWS vary based on the region you choose. You
also need to factor in metrics, logs, events, alerts, dashboards, retention,
//26
<PART 3>
archiving, data transfer and more to arrive at your final cost with AWS
CloudWatch. Given the various options to choose from, pricing can be
complex with AWS CloudWatch.
Costs also matter for metrics. You get only 10 detailed monitoring metrics
in the free tier, limiting your ability to do in-depth monitoring. Additionally,
only 3 dashboards and 10 alarms are included in the free tier. Custom
events have additional costs as well.
Logs published by AWS services are priced like custom logs. If you store
a lot of logs in AWS, there’s a discounted vended log pricing which is
subject to volume discounts, but currently, this only includes VPC and
Route 53 logs.
In order to have better visibility into your CloudWatch costs, you can tag
Log Groups, through the API (it isn’t currently possible via the console).
AWS
Logging Costs
Compute
Capacity
Outbound
Charges Data Transfer
For
Pricing Storage
Inbound
No Data Transfer
Charges For
//27
<PART 4>
AWS
CENTRALIZED
LOGGING
<PART 4>
Once gathered together in the same region within AWS, the logs can be
pushed to a more powerful logging solution like AWS Elasticsearch or an
AWS partner service such as Coralogix.
//29
<PART 4>
FEATURES
SERVICE Managed
º Self-managed
º Fully-Managed
º
ERROR No
✖ No
✖ ML, Automated
º
ANOMALIES
VOLUME No
✖ No
✖ ML, Automated
º
ANOMALIES
INSTANT No
✖ No
✖ Yes
✔
LOG CLUSTERING
INSTANT No
✖ No
✖ Yes
✔
LOG CLUSTERING
DASHBOARDS Basic
º Self-managed Kibana
º Managed Kibana,
º
Proprietary Custom
Dashboards
Parsing unstructured
º Custom parsing rules
º
data is only possible
while querying
//30
<PART 4>
FEATURES
THREAT DETECTION No
✖ No
✖ IP Enrichment
º
LIVE TAIL No
✖ No
✖ Yes
✔
Centralizing AWS
Logs with Coralogix
Coralogix is a cloud-based log analytics tool available via the AWS
Marketplace for convenient billing and tighter integration. It improves on
AWS CloudWatch in many ways and is an advanced logging solution for
AWS. Here are the key advantages of Coralogix over CloudWatch:
//31
<PART 4>
➜ Unlimited Dashboards
➜ Unlimited queries
Key Takeaways
Finally, here are the key takeaways that you should remember as you
think ahead about your logging strategy:
//32
<PART 4>
security
advanced features
machine learning
//33
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
//34
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
( next)
//35
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
(...)
EC2
Install Cloudwatch Agent
º
on EC2 Grant permission to
allow EC2 to create and write
to CloudWatch Log Groups
and Log Streams Edit the
CloudWatch Log Agent’s
configuration file to define the
file path of the logs on the EC2
instance. Edit the CloudWatch
Log Agent’s configuration file
to choose a CloudWatch Group
and Stream to ship the logs to
( next)
//36
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
(...)
ECS
Alternatively, install the json-file
º
log driver. The log events are
then retrieved via the Docker
Remote API using a Logspout
service to centralize and ship
all logs to a destination like
Logstash.
API calls
º
API calls
INSPECTOR º Cloud 15 minutes Via CloudTrail
º
Trail
//37
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
LAMBDA Events
º Lambda 5 seconds Logging statements in code
º
Cloud by default, (stdout) are pushed to a
Logs: Memory Used, and the
º
Watch can be CloudWatch Log Group linked
Billed Duration at the end of
changed with the Lambda function.
each invocation (otherwise
not tracked by CW metrics) PutLogEvents: 5 requests
º
per second per log stream
(adds additional execution
time)CloudWatch groups are
generated whenever you create
a new Lambda function.
//38
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
//39
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
Metrics
º
API Calls
º
//40
<PART 4>
AWS
SERVICE WHAT TO LOG WHERE DELAY HOW
//41
Start solving your
production issues faster
Managed, scaled,
and compliant monitoring,
built for CI/CD