Customlog Logformat: Common Log Format
Customlog Logformat: Common Log Format
Of course, storing the information in the access log is only the start of log management. The next
step is to analyze this information to produce useful statistics. Log analysis in general is beyond
the scope of this document, and not really part of the job of the web server itself. For more
information about this topic, and for applications which perform log analysis, check the Open
Directory or Yahoo.
Various versions of Apache httpd have used other modules and directives to control access
logging, including mod_log_referer, mod_log_agent, and the TransferLog directive. The
CustomLog directive now subsumes the functionality of all the older directives.
The format of the access log is highly configurable. The format is specified using a format string
that looks much like a C-style printf(1) format string. Some examples are presented in the next
sections. For a complete list of the possible contents of the format string, see the mod_log_config
documentation.
This defines the nickname common and associates it with a particular log format string. The
format string consists of percent directives, each of which tell the server to log a particular piece
of information. Literal characters may also be placed in the format string and will be copied
directly into the log output. The quote character (") must be escaped by placing a back-slash
before it to prevent it from being interpreted as the end of the format string. The format string
may also contain the special control characters "\n" for new-line and "\t" for tab.
The CustomLog directive sets up a new log file using the defined nickname. The filename for the
access log is relative to the ServerRoot unless it begins with a slash.
The above configuration will write log entries in a format known as the Common Log Format
(CLF). This standard format can be produced by many different web servers and read by many
log analysis programs. The log file entries produced in CLF will look something like this:
127.0.0.1 (%h)
This is the IP address of the client (remote host) which made the request to the server. If
HostnameLookups is set to On, then the server will try to determine the hostname and log
it in place of the IP address. However, this configuration is not recommended since it can
significantly slow the server. Instead, it is best to use a log post-processor such as
logresolve to determine the hostnames. The IP address reported here is not necessarily the
address of the machine at which the user is sitting. If a proxy server exists between the
user and the server, this address will be the address of the proxy, rather than the
originating machine.
- (%l)
The "hyphen" in the output indicates that the requested piece of information is not
available. In this case, the information that is not available is the RFC 1413 identity of the
client determined by identd on the clients machine. This information is highly unreliable
and should almost never be used except on tightly controlled internal networks. Apache
httpd will not even attempt to determine this information unless IdentityCheck is set to
On.
frank (%u)
This is the userid of the person requesting the document as determined by HTTP
authentication. The same value is typically provided to CGI scripts in the REMOTE_USER
environment variable. If the status code for the request (see below) is 401, then this value
should not be trusted because the user is not yet authenticated. If the document is not
password protected, this entry will be "-" just like the previous one.
[10/Oct/2000:13:55:36 -0700] (%t)
The time that the server finished processing the request. The format is:
[day/month/year:hour:minute:second zone]
day = 2*digit
month = 3*letter
year = 4*digit
hour = 2*digit
minute = 2*digit
second = 2*digit
zone = (`+' | `-') 4*digit
It is possible to have the time displayed in another format by specifying %{format}t in
the log format string, where format is as in strftime(3) from the C standard library.
"GET /apache_pb.gif HTTP/1.0" (\"%r\")
The request line from the client is given in double quotes. The request line contains a
great deal of useful information. First, the method used by the client is GET. Second, the
client requested the resource /apache_pb.gif, and third, the client used the protocol
HTTP/1.0. It is also possible to log one or more parts of the request line independently.
For example, the format string "%m %U%q %H" will log the method, path, query-string, and
protocol, resulting in exactly the same output as "%r".
200 (%>s)
This is the status code that the server sends back to the client. This information is very
valuable, because it reveals whether the request resulted in a successful response (codes
beginning in 2), a redirection (codes beginning in 3), an error caused by the client (codes
beginning in 4), or an error in the server (codes beginning in 5). The full list of possible
status codes can be found in the HTTP specification (RFC2616 section 10).
2326 (%b)
The last entry indicates the size of the object returned to the client, not including the
response headers. If no content was returned to the client, this value will be "-". To log
"0" for no content, use %B instead.
Another commonly used format string is called the Combined Log Format. It can be used as
follows.
This format is exactly the same as the Common Log Format, with the addition of two more
fields. Each of the additional fields uses the percent-directive %{header}i, where header can be
any HTTP request header. The access log under this format will look like:
"http://www.example.com/start.html" (\"%{Referer}i\")
The "Referer" (sic) HTTP request header. This gives the site that the client reports having
been referred from. (This should be the page that links to or includes /apache_pb.gif).
"Mozilla/4.08 [en] (Win98; I ;Nav)" (\"%{User-agent}i\")
The User-Agent HTTP request header. This is the identifying information that the client
browser reports about itself.
Multiple access logs can be created simply by specifying multiple CustomLog directives in the
configuration file. For example, the following directives will create three access logs. The first
contains the basic CLF information, while the second and third contain referer and browser
information. The last two CustomLog lines show how to mimic the effects of the ReferLog and
AgentLog directives.
This example also shows that it is not necessary to define a nickname with the LogFormat
directive. Instead, the log format can be specified directly in the CustomLog directive.
Conditional Logging
There are times when it is convenient to exclude certain entries from the access logs based on
characteristics of the client request. This is easily accomplished with the help of environment
variables. First, an environment variable must be set to indicate that the request meets certain
conditions. This is usually accomplished with SetEnvIf. Then the env= clause of the CustomLog
directive is used to include or exclude requests where the environment variable is set. Some
examples:
As another example, consider logging requests from english-speakers to one log file, and non-
english speakers to a different log file.
Although we have just shown that conditional logging is very powerful and flexibly, it is not the
only way to control the contents of the logs. Log files are more useful when they contain a
complete record of server activity. It is often easier to simply post-process the log files to remove
requests that you do not want to consider.