0% found this document useful (0 votes)

147 views

Evaluating Hardware and Software Web Proxy Caching Solutions

This document summarizes a performance study of different web cache replacement schemes including LRU, LFU-DA, GDS, and GD* using trace-driven simulation. It analyzes these schemes under current and predicted future workloads with increasing multimedia content. The study characterizes request streams at a web proxy into HTML, images and multimedia documents. It predicts future trends showing multimedia requests will substantially increase while temporal locality will decrease. The results suggest Squid with GD*(1) is best for current workloads, while Squid with GDS(1) is best for predicted future workloads when considering both hit rate and byte hit rate.

Uploaded by

Aliasgar Patanwala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views

Evaluating Hardware and Software Web Proxy Caching Solutions

Uploaded by

Aliasgar Patanwala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Computer Systems and Performance Evaluation Prof. Dr.-Ing.

Christoph Lindemann

Evaluating Hardware and Software Web Proxy Caching Solutions

Christoph Lindemann and Oliver P. Waldhorst University of Dortmund Department of Computer Science August-Schmidt-Str. 12 44227 Dortmund, Germany http://www4.cs.uni-dortmund.de/~Lindemann/

November 2000

Report for Milestone 1 of the Project

Analysis of the Effectiveness of Web Caching in the Gigabit Research Network G-WiN
supported by the DFN-Verein with funds of the BMBF

Abstract
Commercial Web caching solutions include CacheFlows Server Accelerator, Cisco CacheEngine, InforLibrias DynaCache, Network Appliance NetCache, Inktomies Traffic Server, and Novells InternetCache System. These products differ in cache size, disk storage, and throughput. However, all commercial Web caching products currently on the market solely rely on the replacement scheme Least Recently Used. Only Squid as open-source software freely available to academic institutions can be configured to employ other cache replacement schemes which have been proposed recently. In this paper, we present a comprehensive performance study of Least Recently Used (LRU) employed in commercial products as well as for the newly proposed schemes Segmented LRU (SLRU), Least Frequently Used with Dynamic Aging (LFU-DA), Greedy Dual Size (GDS), and Greedy Dual * (GD*) under current and future workload characteristics. The presented performance results are derived using trace-driven simulation. As a novel feature of our study, we characterize request streams seen at a Web proxy cache to HTML, images, and multi media documents seen at a proxy cache. Based on this workload characterization, we derive workload forecasts for institutional Web proxy caches and proxies residing in a backbone network. The goal of our study constitutes the understanding how these replacement schemes deal with different Web document classes. This understanding is important for the effective design of Web cache replacement schemes under changing workload characteristics.

Key words: Performance-oriented design and evaluation studies of Web servers, Web cache replacement schemes, workload characterization and forecasting, trace driven simulation.

Introduction

The continued growth of the World Wide Web and the emergence of new multi media applications necessitates the use of proxy caches to reduce end-user latency and network traffic. Commercial Web caching solutions include CacheFlows Server Accelerator, Cisco CacheEngine, InforLibrias DynaCache, Network Appliance NetCache, Inktomies Traffic Server, Novells Internet Cache System. These products differ in cache size, disk storage, and throughput. However, all commercial products currently on the market solely rely on the replacement scheme Least-Recently-Used. Only Squid as open-source software freely available to academic institutions can be configured to employ other cache replacement schemes which have been proposed recently. The optimization of cache replacement schemes is important because the growth rate of Web content (i.e., multi media documents) is much higher that anticipated growth of memory sizes for future Web caches [14]. Furthermore, recent studies (see e.g. [3]) have shown hit rate and byte hit rate grow in a log-like fashion as a function of size of the Web cache. Cao and Irani introduced the Web cache replacement scheme Greedy Dual Size [5] which takes into account document sizes and a user defined cost function. They showed that Greedy Dual size is on-line optimal with respect to this cost function. Jin and Bestavros introduced Web cache replacement scheme Greedy Dual * as an improvement to Greedy Dual Size [14]. They compared the performance of this newly proposed replacement scheme with traditional schemes, are Least Recently Used (LRU), Least Frequently Used with Dynamic Aging (LFUDA), and with the size-aware scheme Greedy Dual Size [14]. Eager, Ferris, and Vernon developed analytical models for determining optimal proxy cache content for supporting continuous-media streaming [9]. Arlitt, Friedrich, and Jin provided a comparative performance study of six Web cache replacement schemes among which are LRU, LFU-DA, and Greedy Dual Size [1]. They also observed an extreme nonuniformity in popularity of Web requests seen at proxy caches. All these previous performance studies consider a single request stream for analyzing the performance of replacement schemes. This report focuses on the evaluation of the replacement schemes of hardware and software solutions of Web proxy caching solutions [7]. The companion report coming up in February 2001 focuses on the evaluation of the cooperative Web caching protocols (i.e., the Internet Cache Protocol, ICP, the Cache Array Routing Protocol, CARP, and Cache Digests). In this report, we present comprehensive performance studies for LRU as traditional replacement schemes as well as newly proposed schemes LFU-DA, Greedy-Dual-Size and Greedy-Dual * under current and future workload characteristics. The goal of our study constitutes the understanding how these replacement schemes deal with different Web document classes. This understanding is important for the effective design of Web cache replacement schemes under changing workload characteristics.

-2A comprehensive characterization of previous Web workloads was given by Arlitt and Williamson [2]. A recent survey article on performance characteristics of the Web provided by Crovella [8] explains why many of the characteristics of Web workloads (e.g., document sizes and document popularity) possess high variability. The temporal locality in Web workloads have been subject to two recent papers. Jin and Bestavros investigated temporal locality in Web cache request streams [13]. Mahanti and Williamson provided a detailed workload characterization for Web proxy caches [16]. They observed that in several workload measured in 1998 HTML and image documents account for over 95% of all requests. Eager, Mahanti and Williamson investigated the impact of temporal locality on proxy cache performance [15]. The workload characterization presented in Section 4 indicates that in future workloads percentage of request for multi media documents will be substantially larger than in current Web request streams seen at a proxy cache. Furthermore, we observed that the popularity of some multi media documents rapidly increases. Moreover, the time between two successive references to the same Web document denoted as temporal correlation decreases. These trends are derived from five traces measured in 1996, 1998, and 2000. Based on these trends, using linear regression we derive workload forecasts both for institutional Web proxy caches and for proxy caches located in backbone networks. We present curves plotting the hit rate and byte hit rate broken down for the HTML, image, and multi media documents. This breakdown of hit rates and byte hit rates per document class shows that the overall hit rate is mainly influenced by the hit rate on images. The overall byte hit rate is mainly influenced by the byte hit rate on multi media documents. The presented curves indicate that in an overall evaluation considering both hit rates and byte hit rates the software Web caching solution Squid with the replacement scheme GD*(1) should be the choice for current workloads whereas Squid with the replacement scheme GDS(1) should be the choice for future workloads. The performance results are derived by trace-driven simulation. The simulator for the Web cache replacement strategies has been implemented using the simulation library CSIM [21]. This report is organized as follows. Section 2 introduces commercial Web caching products currently on the market. The replacement schemes employed in these Web caching solutions are described in Section 3. Section 4 provides a comprehensive characterization of the workloads derived from the considered traces. Moreover, we present two workload forecasts taking into account the rapidly increasing popularity of multi media applications. In Sections 5, we present performance curves for the considered Web cache replacement schemes derived from trace data and workload forecasts. The conclusion section summarizes the results of the presented performance studies. To make the paper self-contained, two appendices are included.

-3-

Web Caching Products

2.1 Hardware Solutions

CacheFlow Cache Flow [4] offers two Web caching product lines called Client Accelerators and Server Accelerators. Server Accelerators are surrogates also known as reverse proxy caches. They are placed in front of a Web server in order to service request for documents located on the Web server. Their functionality adds availability to popular Web sides by taking load form the origin server. Server Accelerators are outside the scope of this report. Client Accelerators (CA) are Web caches which can be placed in existing networks. They reduce response times and bandwidth requirements by moving Web and streaming content closer to the user and accelerate client requests. Thus, they achieve scalability of existing networks. All CacheFlow accelerators are run by the patent-pending CacheOS operating system. CacheOS is a specially designed operating system for offering scalability and reliability to Web caching applications. All CacheFlow accelerators can be upgraded by software-addons to enable firewall functionality and content filtering. CacheFlow accelerators support the Hypertext Transfer Protocol (HTTP) v1.0 and v1.1, the File Transfer Protocol (FTP), the Network News Transfer Protocol (NNTP) and domain name system (DNS) caching. The CA 600 family additionally supports the Internet Cache Protocol (ICP) as well as the Web Cache Coordination Protocol (WCCP) v1.0 and v2.0 for cooperative Web caching. The WCCP protocol is shipped with Cisco routers and offers transparent Web caching. Network management support is provided through compatibility with the Simple Network Management Protocol (SNMP). The CacheFlow products currently available do not offer protocol support for streaming applications like digital audio and video transmission over the Internet. CacheFlow product line of accelerators differs in cache size, throughput, and price. The CA 600 family is designed for small Internet Service Providers (ISPs) and enterprises, while CA 3000 and 5000 families are designed for large ISPs who aim at saving substantial bandwidth in the wide area network. The CA-600 series of client accelerators are used by enterprises, ISPs, and other organizations worldwide to manage and control Web traffic growth, while accelerating the delivery of content to users. The CacheFlow client accelerator is deployed between users and the Internet or at remote sites, and intelligently manages requests for content. The CacheFlow 3000 is a high performance Internet caching appliance. Supporting incoming traffic loads from 10 to 45 Mbps, the 3000 Series is a mid-range Web caching solution for ISPs and enterprises. According to its vendor, CacheFlow 3000 products scale network capacity with minimal infrastructure investments. The CacheFlow 5000 is the high-end product of CacheFlow supporting incoming traffic loads up to 135 Mbps containing

-4126 GB disk storage. The technical data of CacheFlow client accelerators is summarized in Table 1. Cisco Ciscos Web caching solution comprises of the Cache Engine 500 series [6]. The Cisco Cache Engine 500 series products accelerate content delivery, optimize network bandwidth utilization, and control access to content. Opposed to the operating-system-based caching solution provided by CacheFlow, Cisco cache engines are integrated into the network infrastructure. Cisco caching solutions can be cost-effectively deploy on a wide-scale basis and gain the benefits of caching throughout your entire network. Traditional proxy-based or standalone caches are not inherently designed to be network integrated, resulting in relatively higher costs of ownership and making them less desirable for wide-scale deployment. All Cisco products support HTTP v1.0 and v1.1, FTP, NNTP, and DNS caching. Compatibility to existing environments for cooperative Web caching is provided by supporting ICP. In 1997, Cisco pioneered the industry's first content routing technology, the Web Cache Coordination Protocol (WCCP) version 1.0. WCCP is a router-cache protocol that localizes network traffic and provides network-intelligent load distribution across multiple network caches for maximized download performance and content availability. Since spring 2000, the protocol WCCP v2.0 is available and widely employed. According to Cisco, they will continue to lead the innovations and enhancements to this protocol and other content routing technologies. As for CacheFlow products, Cisco provides network management support through compatibility with SNMP and Cisco products currently available do not offer protocol support for streaming applications. Cisco Cache Engine Series products differ in storage capacity and throughput. Cisco Cache Engine 505 is a entry-level cache engine for small enterprise branch offices with incoming traffic up to 1.5 Mbps. Ciscos Cache Engine 550 is a midrange a Web caching solution for regional offices with uplink network bandwidth up to 11 Mbps. Ciscos Cache Engine 570 is a Web caching solution for small service provider POPs and medium-sized enterprises with incoming traffic up to 22 Mbps. Storage expansion is available via the Cisco Storage Array. Cache Engine 590 is a high-end caching solution designed for service providers and large enterprises with incoming traffic up to 44.7 Mbps. The technical data of Cisco Cache Engine series products is summarized in Table 1. InfoLibria The Web caching solution offered by InfoLibria's comprises of the DynaCache [12] products. According to InfoLibria, DynaCache offers carrier-grade caching and intelligent content management. By automatically storing commonly requested Web objects at the edge of the local network, DynaCache enables high-speed access to the freshest Internet content while

-5minimizing bandwidth demand to popular Web sites. The result is increased network reliability, faster performance and shorter end-user latency. DynaCache products comes in configurations to meet the diverse networking and business needs. DynaCache technology is applied at ISPs and Application Service Providers (ASPs), wireless ISPs, and Satellite Service Providers. As CacheFlow accelerators InfoLibrias DynaCache products are run by a special-purpose operating system specially designed for offering scalability and reliability to Web caching applications. DynaCache contains software for firewall functionality and content filtering. DynaCache supports the protocols HTTP v1.0 and v1.1, FTP, NNTP, and DNS caching. For cooperative Web caching, DynaCache supports ICP and WCCP v2.0. Network management support is provided through compatibility with SNMP. As for CacheFlow and Cisco products, InfoLibrias products currently available do not offer protocol support for streaming applications like digital audio and video transmission over the Internet. InfoLibrias DynaCache series offer products with different storage capacity and throughput. The entry solution constitutes the DynaCache 10 with a hard disk storage capacity of 36 GB and a maximal throughput of 12 Mbps. The most powerful Web caching device of InfoLibrias product line is the DynaCache 40. Its hard disks offer up do 144 GB of cache storage. Its maximum throughput is 34 Mbps. The technical data of InfoLibrias DynaCache products is summarized in Table 1. Network Appliance The hardware Web caching solution offered by Network Appliance [19] comprises of the NetCache product lines. NetCache products solve content delivery problems faced by enterprises, content distribution networks, ISPs, and ASPs. These appliances can be used in the entire network, from central headquarters to remote points of presence and local offices. Opposed to the products introduced above, the NetCache product line does support streaming applications. Thus, NetCache can also deliver digital audio and video enabling nextgeneration network applications such as large-scale video-on-demand services. As CacheFlow accelerators and InfoLibrias DynaCache, the NetCache products are run by a special-purpose operating system specially designed for offering scalability and reliability to Web caching and media streaming applications. NetCache supports the protocols HTTP v1.0 and v1.1, FTP, NNTP, and DNS caching. For cooperative Web caching, DynaCache supports ICP and WCCP v2.0. Network management support is provided through compatibility with SNMP. NetCache also include support for major streaming media technologies through compatibility with the Real Time Streaming Protocol (RTSP) and the internet Content Adaptation Protocol (iCAP). The NetCache family includes three distinct product lines: NetCache C1100, NetCache C700, and NetCache C6100. NetCache C1100 Series are the entry-level NetCache products

-6designed for enterprise remote or branch offices as well as small and medium enterprises. The C1100 series supports multiple connections for HTTP environments with 1.5 Mbps bandwidth and connections with 155 Mbps for streaming applications. The mid-range NetCache C700 Series products supporting a wide range of capacity, performance, and reliability features. Reliability and availability of mission-critical data is ensured with features like RAID, redundant hardware, and hot-swap drives. The expansion choices make the NetCache C700 series ideal solutions for environments experiencing rapid growth. The highend NetCache C6100 Series products deliver highest performance and reliability for the data center and other high-bandwidth locations. The C6100 solutions support 155 Mbps and more for HTTP environments and 622 Mbps and more for streaming applications. Large content libraries with up to 2 TB of storage can be reliably stored and accessed. Table 1 summarizes the technical data of the hardware solutions for Web caching. Note that the commercial products differ in size of disk storage and RAM while all products employ Least Recently Used (LRU) as replacement scheme for Web documents.

Vendor

Product

CacheFlow CacheFlow 600 CacheFlow 3000 CacheFlow 5000 Cisco Systems Cache Engine 505 Cache Engine 550 Cache Engine 570 Cache Engine 590 DynaCache DC 10 DynaCache DC 20 DynaCache DC 30 DynaCache DC 40 NetCache C1100 NetCache C1105 NetCache C720 NetCache C6100

Disk in GB 36 63 126 18 18 144 144 36 40 72 144 9 72 1,024 2,048

RAM Throughput Replacement in MB in Mbps Scheme 768 10 LRU 1,024 45 LRU N/A 135 LRU 128 256 384 384 512 512 512 1,024 256 512 512 3,096 1.5 11 22 44.7 12 21 27 34 1.5 1.5 N/A 155 LRU LRU LRU LRU LRU LRU LRU LRU LRU LRU LRU LRU

InfoLibria

Network Appliance

Table 1. Technical summary of hardware caching solutions

2.2 Software Solutions

Inktomi Inktomi [11] offers a software Web caching solution called Traffic Server. Inktomis Traffic Server is a robust network cache platform that improves quality of service, optimizes bandwidth usage and provides a foundation for the delivery of new services at the edge of the

-7network. Traffic Server is available in three versions: Traffic Server C-Class for carriers and service providers, Traffic Server E-Class for enterprise networks, Traffic Server Engine cache appliance solutions. The first two products are true software solutions; the latter one constitutes an integrated hardware and software package through Inktomi's partners called original equipment manufacturers (OEM). These OEMs associated with Inktomie include Sun, HP, and SGI. As a software solution, Traffic Server easily allows the integration of services like on-demand streaming media, filtering and transformation at the edge of the network. According to Inktomie, Traffic Server is the only network cache that also functions as a platform for delivering services at the edge of the network. Traffic Server allows the integration of applications directly into the network to perform valuable functions like filtering out inappropriate or offensive Web content or transforming Internet content so that it can be viewed on cell phones or hand-held PCs. As all hardware cache solutions introduced in the previous section, Inktomis Traffic Server supports the protocols HTTP v1.0 and v1.1, FTP, NNTP, and DNS caching. For cooperative Web caching, the Traffic Server products support the protocols ICP and WCCP v2.0. ICP is used for cache coordination and provides compatibility with existing network caches. Transparent caching using the WCCP protocol enables interoperability between Inktomis Traffic Server and Cisco-based routers. Network management support is provided through compatibility with SNMP. As the NetCache products of Network Appliance, Inktomis Traffic Server include support for major streaming media technologies through compatibility with the Real Time Streaming Protocol (RTSP) and the internet Content Adaptation Protocol (iCAP). Inktomies Traffic Server can be run on the operating system platforms Sun Solaris 2.6 and Solaris 7 under a Sun Ultra SPARC with at least 256 MB RAM, True64 UNIX on a Digital Alpha/OSF server with at least 256 MB RAM, SGI IRIX 6.5 on SGI MIPS systems with at least 256 MB RAM, as well as FreeBSD 3.1 and Windows NT 4.0 on any Pentium-based system or equivalent. On these systems, the Traffic Server software platforms support six to eight disks for Web caching. Employing the largest SCSI disks currently available, the highend Traffic Server product can manage cache size of 400 GB. Throughput values achieved by Inktomies Traffic Server products cannot be specified since it depends on the underlying hardware and operation system. Novell The software Web caching solution offered by Novell [20] comprises of the Internet Caching System (ICS) product line. ICS enables small and medium enterprises as well as ISPs to increase the efficiency of their network infrastructures while reducing their associated costs by acting as forward proxy to accelerate organizations' access to the Internet. According to Novell, ICS appliances typically serve 40% to 70% of requests directly from the cache, thus,

-8reducing request latency and tremendously improving the network efficiency. ICS appliances can be configured in clusters for load balancing and fault tolerance. Furthermore, ICS also supports activity logging and transparent proxy. ICS provides high-speed delivery of any static multimedia object through HTTP encapsulation. For December 2000, Novell has announced to add native support for common media formats, providing control of live and ondemand media streams. Novells partners include major PC manufacturers such as Compaq, Dell, and IBM. These original equipment manufacturers (OEM) have licensed ICS and integrate this software caching solution into their own internet appliances. As Inktomis Traffic Server, Novells ICS supports the protocols HTTP v1.0 and v1.1, FTP, and NNTP. However, opposed to the Traffic Server product line, ICP does not support DNS caching. Again as Inktomis Traffic Server, ICS products support the protocols ICP and WCCP v2.0 for cooperative Web caching and network management support is provided through compatibility with SNMP. As the hardware solutions of CacheFlow, Cisco, and Novells products currently available do not offer protocol support for streaming applications like digital audio and video transmission over the Internet. Novells ICS products run on Intel Pentium based PCs with at least 256 MB RAM. As operating system, the ICS products run under the special purpose operating system Proxy-OS provided by Novell. Depending on the hardware platform, ICS product can manage cache size between 9 and 27 GB. As for Inktomies Traffic Server, throughput values achieved by Novells ICS products cannot be specified since it depends on the underlying hardware platform. Squid Opposed to the commercial hardware and software caching solutions introduced above, Squid [22] is a non-commercial, full-featured software Web proxy cache. Squid is designed to run on Unix systems. Squid is open-source software freely available to academic institutions. The Squid software was originally developed at the National Laboratory for Applied Network Research (NLANR) in a project funded by the National Science Foundation. The Squid project was lead by Duane Wessels. The current version of Squid, Squid v2.3, is the result of efforts by numerous individuals from the Internet community. Due to its open source philosophy, Squid constitutes an ideal platform for implementation of academic prototypes of Web caching schemes and protocols. Squid supports the protocols HTTP v1.0 and v1.1, FTP, NNTP, and DNS caching. For cooperative Web caching, Squid supports the protocols ICP and WCCP v2.0. Moreover, opposed to all other caching solution introduced above Squid also supports cooperative caching using the Cache Array Routing Protocol (CARP) and Cache Digests. As Novells software solution, Squid does not offer protocol support for streaming applications like digital audio and video transmission over the Internet.

-9Squid is a high-performance proxy caching server for Web clients. Unlike traditional caching software, Squid handles all requests in a single, non-blocking, I/O-driven process. Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests. Squid supports the Secure Sockets Layer (SSL), extensive access controls, and full request logging. By using the lightweight ICP, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings. Squid consists of a main server program called squid, a DNS lookup program called dnsserver, some optional programs for rewriting requests and performing authentication, and some management and client tools. When squid starts up, it spawns a configurable number of dnsserver processes, each of which can perform a single blocking DNS lookup. Squid runs on any modern Unix platform. In particular, Squid runs on Linux, FreeBSD, NetBSD, OSF and Digital Unix, IRIX, SunOS/Solaris, AIX, HP-UX, and OS/2. The minimum hardware requirements comprise of a single-processor PC or workstation with 128 MB RAM. Depending on the hardware platform, Squid can manage cache size up to 512 GB. As in case of the other software solutions, throughput values achieved by Squid cannot be specified. Table 2 summarizes the product data of the software solutions for Web caching. Note that again all commercial products employ LRU as replacement scheme for Web documents. Note that Squid not only can be configured to use the cache replacement schemes LRU but also Least Frequently Used with Dynamic Aging (LFU-DA), Segmented LRU, and Greedy Dual Size (GDS).

Vendor Product Inktomi Traffic Server E-Class Traffic Server C-Class Traffic Server Engine Novell Internet Caching System NLANR Squid

Replacement LRU LRU

Original Equipement Manufacturer Intel NetStructure 3Com Compaq TaskSmart Dell PowerAppliance IBM Netfinity Unix Workstations

LRU, SLRU, LFU-DA, GDS Table 2. Technical summary of software caching solutions

-10Vendor Transport Protocol HTTP 1.0/1.1 Cooperative Web Caching Cache Digests Streaming and Content Adaption

WCCP

CARP

NNTP

RTSP x x -

Cache Flow Cisco InfoLibria Network Appliance Inktomi Novell Squid

x x x x x x x

x x x x -

x x x

x x x x x x

x x x x x x x

Table 3. Protocol support provided by hardware and software caching solutions Table 3 provides a summary of the protocols for data transport, cooperative Web caching, streaming and content adaption supported by the considered hardware and software solutions for Web caching.

Web Cache Replacement Schemes

In traditional memory systems object sizes (i.e., a cache line or a memory page) and miss penalties (delay for bringing an object into the cache) are constant. The salient feature of Web caching lies in the high variability of both the cost for bringing in new Web documents and the size of such documents. In this paper, we present the results for Least Recently Used, Segmented Least Recently Used, a frequency based algorithm Least Frequently Used with Dynamic Aging, and two size-aware replacement schemes Greedy Dual Size and Greedy Dual * which have been recently proposed. In [14], two cost models for Web cache replacement schemes have been introduced. In the constant cost model, the cost of document retrieval is fixed. The packet cost model assumes that the cost of document retrieval is determined by the number of packets transmitted. The constant cost model is the model of choice for institutional proxy caches which mainly aim at reducing end user latency by optimizing the hit rate. The packet cost model is appropriate for backbone proxy caches aiming at reducing network traffic by optimizing the byte hit rate. Least Recently Used (LRU [2]) is a recency-based policy. It is based on the assumption that a recently referenced document will be referenced again in near future. Therefore, on replacement LRU removes the document from cache which has not been referenced for the longest period of time. The functionality of LRU is illustrated in Figure 1. LRU uses a LRUStack. On a cache miss, the requested document is put on the most recently used (MRU) end

iCAP x x -

DNS

FTP

ICP

-11-

MRU

...

Hit LRU

Miss

Evicted Document

Figure 1. Illustration of the Least Recently Used replacement scheme of the stack. All other documents are pushed one step towards the least recently used (LRU) end of the stack. On a cache hit, the requested document is located in the stack and moved again to the MRU end. On document eviction, the document at the LRU end of the stack is evicted. LRU can be implemented using a reference stack as illustrated in Figure 1. LRU is the most widely used cache replacement scheme. Because LRU considers a fixed cost and size of documents, LRU aims at optimizing the hit rate. The good performance of LRU is due to the exploitation of locality of reference in the document request stream. The disadvantage of LRU lies in neglecting the variability in cost and size of Web documents. Furthermore, LRU does not take into account frequency information in the request stream. Segmented Least Recently Used (SLRU [2]) was originally designed for disk-caches with a fixed size of cached objects. It can be easily adapted to environments with variable sizes of cached objects. The problem of LRU is that after the first reference a document is put to the MRU end of the stack. From there it is pushed down step by step towards the LRU end, even if it is never referenced again. LRU uses a large fraction of the cache for this kind of

Protected Segment Most Recently Used

Least Recently Used Miss Most Recently Used

...
Least Recently Used Unprotected Segment

Hit

Figure 2. Illustration of the Segmented Least Recently Used replacement scheme

...

Hit

Evicted Document

-12one timers. Segmented LRU solves the problem of one timers by dividing the stack in two segments, a protected segment and an unprotected segment. Both segments are implemented by LRU stacks. After a cache miss, the requested document is fetched and put at the MRU end of the unprotected segment. From there it is pushed down towards the LRU end. If the document is never referenced again, it is evicted form cache when reaching the LRU end. On a cache hit, the requested document is located in the stacks and placed at the MRU end of the protected segment from which is pushed down step by step towards the LRU end of the segment. If an object reaches the LRU end, it is placed at the MRU end of the unprotected segment and treated as after a cache miss. The functionality of SLRU is illustrated in Figure 2. SLRU is a parameterized replacement scheme. It requires a parameter specifying the fraction fp of cache memory used for the protected segment. Previous studies have shown that a fp = 0.6 yields best results [1]. Therefore, in performance studies we set size of the protected segment to 60% of cache size. Least Frequently Used with Dynamic Aging (LFU-DA [1]) is a frequency-based policy that also takes into account the recency information under a fixed cost and fixed size assumption. In LFU, a decision to evict a document from cache is made by the number of references made to that document. The reference count for all documents in cache is kept and the document with smallest reference count is evicted. LFU-DA extends LFU by an dynamic aging algorithm in order to avoid cache pollution. LFU-DA keeps a cache age which is set to the reference count of the last evicted document. When putting a new document into cache or referencing an old one, the cache age is added to the documents reference count. It has been shown that LFU-DA achieves high byte hit rates. A basic implementation of LRU-DA shown in Figure 3. A value Ap is associated with each cached document. The cache age is denoted by

Initialize Set L 0 for each request to a document p do if p resides in cache Ap L + V(p) else fetch p while there is not enough free space in the cache Set L min{ Aq | q is a document in the cache } evict the document q with smallest Aq value end while Set Ap L + V(p) end if end for Figure 3. Pseudo code implementation for LFU-DA, GDS and GD*

-13L, whereas the value of a document V(p) is set to its reference count. On every request, the value Ap of the requested document is updated. On every eviction, L is set to the value Ap of the evicted document. Greedy Dual Size (GDS [5]) proposed by Cao and Irani considers variability in cost and size of Web documents by choosing the victim based on the ratio between the cost and size of documents. As LFU-DA, GDS associates a value Hp with each Web document p in the cache. When document p is brought initially into the cache or is referenced while already in cache, H ( p) is set to c( p) s( p) . Here s( p) is the document size and c( p) is a cost function describing the cost of bringing p into the cache. When a document has to be replaced, the victim p with Hmin : = min{H ( p)} is chosen among all documents resident in the cache. p Subsequently, the H values are reduced by Hmin [5]. However, as LRU, the disadvantage of GDS lies in not taking into account frequency information in the request stream. An efficient implementation of the GDS functionality is provided in Figure 3. Here, the Value V(p) of a document p is set to H(p). Greedy Dual * (GD* [13], [14]) proposed by Jin and Bestavros captures both popularity and temporal correlation in a Web document reference stream. The frequency in the formula for the base value L captures long-term popularity. Temporal correlation is taken into account by the rate of aging controlled by the parameter . GD* sets the values of H for a document p to H ( p) = f ( p) c( p) s( p) where f ( p) is the documents reference count. The parameter is characterizing the temporal correlation between successive references to a certain document observed in the workload as described in Section 4. The novel feature of GD* is that f ( p) and can be calculated in an on-line fashion which makes the algorithm adaptive to these workload characteristics. GD* can be implemented by setting V(p) = H(p) in the pseudo code implementation shown in Figure 3. GDS and GD* describe families of algorithms. The optimized performance measure (i.e. hit rate or byte hit rate) of a specific implementation depends on definition of the cost function c( p) . In this paper we examine two variants of GDS and GD*. The first applies the constant cost model by setting cost function to c( p) = 1. We refer to the resulting algorithms as GDS(1) and GD*(1), respectively. The second variant applies the packet cost model by setting the cost function to the number of TCP packets needed to transmit document p, i.e., c( p) = 2 + s( p) 536 . These replacement schemes are denoted GDS(packets) and GD*(packets), respectively. A discrete-event simulator has been implemented for the replacement schemes LRU, SLRU, LFA-DA, GDS(1), GD*(1), GDS(packets), and GD*(packets) using the simulation library CSIM [14]. This simulator consists of 15,000 lines of C++ code. The simulation runs presented in Section 4 are performed on a dual processor Sun Enterprise 450 workstation. For details of the simulation environment see Appendix A and B.

-14-

Workload Characterization of Traces and Workload Forecast

4.1 Characterization of Current Web Proxy Workloads

To characterize the workload of Web proxy caches, we consider five different traces. The oldest trace was collected in 1996 by DEC [18] and already used in previous performance studies of replacement schemes [3], [4], [14]. The most recent trace was recorded in July 2000 in the German research network by DFN [10]. For workload forecasting, we additionally consider traces collected at the Canadian CA* net II [16], at the University of Saskatchewan [16] both of 1998, and at the University of Dortmund of July 2000. These five traces are referred to as DEC, DFN, CANARIE, Univ. Sask., and Univ. Do., respectively. The DEC and DFN traces are used for evaluating the performance of proxy cache replacement schemes under current workload conditions. The characteristics of the remaining traces are employed for deriving workload forecasts. The CANARIE and DFN traces were collected at the primary-level proxy cache in the core of the Canadian CA* Net II and of the German Research Network, respectively. The DEC, Univ. Sask., and Univ. DO. traces were collected at institutional-level Web proxy caches functioning as secondary-level Web proxy caches. Preprocessing the DEC and DFN traces, we excluded uncachable documents by commonly known heuristics, e.g. by looking for string cgi or ? in the requested URL. From the remaining requests, we considered responses with HTTP status codes 200 (OK), 203 (Non Authoritative Information), 206 (Partial Content), 300 (Multiple Choices), 301 (Moved Permanently), 302 (Found), and 304 (Not Modified) as cacheable [1], [5], [13]. Details on trace preprocessing are given in Appendix B. Table 4 summarizes the properties of the DEC and DFN trace. We brake down the request stream of documents according to their content type as specified in the HTTP header. If no content type entry is specified, we guess the document class using the file extension. We omit documents which could not be classified in this way. We distinguish between three main classes of Web documents: HTML documents (e.g., .html, .htm), image documents (e.g., .gif, .jpeg), and multimedia documents (e.g., .mp3, .ram, .mpeg, .mov). Text files (e.g. .tex, .java) are added to the class of HTML documents. Typical multimedia documents are static audio and video files. A few compressed binary downloads (.gz, .zip) as well as a small number of application documents (.ps and .pdf) contained in the traces are also categorized as multi media documents. Table 4 states the properties of the DEC and DFN traces. We observe that in current workloads HTML and image documents together account for about 95% of documents seen and of requests received. This observation has also been observed in [16] for a number of other proxy traces among which are CANARIE and Univ. Sask. The key property for the performance of Web caching constitutes temporal locality in the request stream. Temporal locality can be quantified by the relationship between the probability of an access to a Web document and the time passed since the last access to this document. As discussed in [14], temporal locality in the request stream is caused by two

-15Trace Date Classification of cache Number of documents Overall size (MB) Mean document size (KB) Variance of document size Number of requests Requested data (MB) DEC 1996 Institutional Proxy 1,226,350 19,420.16 16.21 55.45 3,643,328 45,396.99 DFN 2000 Backbone Proxy 2,841,790 39,434.24 14.55 268.62 6,686,409 81,337.55 626,418 (22.0%) 8,755.20 (22.2%) 14.63 4.28 1,346,231 (20.1%) 17,431.93 (21.4%) 2,063,076 (72.6%) 13,957.12 (35.4%) 7.10 7.38 5,096,117 (76.2%) 25,265.22 (31.1%) 152,296 (5.4%) 16,721.92 (42.4%) 115.23 43,593 244,061 (3.7%) 38,640.31 (47.5%)

All cachable documents

HTML

Images

Multi media

301,926 (24.6%) Number of documents 1,945.60 (10.0%) Overall size (MB) 6.84 Mean document size (KB) 3.96 Variance of document size 574,551 (15.7%) Number of requests 3,997.70 (8.8%) Requested data (MB) 875,700 (71.1%) Number of documents 8,448,00 (43.5%) Overall size (MB) 10.22 Mean document size (KB) 7.66 Variance of document size 2,994,068 (82.2%) Number of requests 21,240.52 (46.8%) Requested data (MB) 48,724 (4.0%) Number of documents 9,026.56 (46.5%) Overall size (MB) 189.71 Mean document size (KB) 25,003 Variance of document size 74,709 (2.1%) Number of requests 20,158.77 (44.4%) Requested data (MB) Table 4. Properties of DEC and DFN traces

different sources: The popularity of Web documents and the temporal correlation in the request stream. A popular Web document is seen often in a request stream. Therefore, popular documents are referenced more often in a short time interval than less popular documents. Temporal correlation take into account the time between two successive references to the same document. A hot Web document is requested several times in a short intervals whereas the average document is referenced just a few times. Temporal locality can be characterized by two parameters. The first parameter, denoted as the popularity index describes the distribution of popularity among the individual documents. The number of requests N to a

-16-

5 4.5 4
log10(reference count)

HTML Image Multi Media

3.5 3 2.5 2 1.5 1 1 1.5 2 2.5 log10(time since last reference) in minutes 3

Figure 4. Breakdown of interreference times by document class in DEC trace

5 4.5 4
log10(reference count)

HTML Image Multi Media

3.5 3 2.5 2 1.5 1 1 1.5 2 2.5 log10(time since last reference) in minutes 3

Figure 5. Breakdown of interreference times by document class in DFN trace

-17Web document is proportional to its popularity rank to the power of , that is: N ~ . The popularity index can be determined the slope of the log/log scale plot for the number of references to a Web document as function of its popularity rank. The second parameter, denoted as measures the temporal correlation between two successive references to the same Web document. The probability P that a document is requested again after n requests is proportional to n to the power of , that is: P ~ n . Temporal correlation between successive accesses to the same document can be measured by plotting the reference count as a function of reference interarrivals. That is the number of requests seen in the request stream between successive access to one particular document. To eliminate the influence of popularity on such a plot (i.e., more popular documents are likely to be accessed after shorter periods of time) the plot is done for equally popular document, e.g. by plotting reference interarrivals after a document has been accessed k times. For the DEC and DFN traces, the calculated values for and are shown in Tables 5 and 6. These values indicate that there are some extremely popular images whereas popularity is more wide spread for text documents and multi media documents. Figures 4 and 5 plot for the DEC and DFN traces the interreference times broken down for each document class. The interreference time is given by the time elapsed between two successive accesses to a particular Web document. Note that distribution of interreference times reflects both popularity and temporal correlation [13]. As shown in Figures 4 and 5, the degree of temporal locality in Web request streams is different for the three considered document classes.

4.2 Forecast of Future Web Proxy Workloads

For forecasting the workload of institutional Web proxy caches, besides the DEC trace we additionally consider Univ. Sask. and Univ. Do. traces. Table 4 presents the characteristics for each document class. The values of the Univ. Sask. trace are derived from Table 5 and Figure 5 of [16]. From Table 5, we observe the following trends: The percentages of requests to multi media documents increases more than linear. The popularity of multi media documents increases, that is the parameter of the Zipf-like distribution decreases. Furthermore, temporal correlation increases and, thus, the parameter increases. The percentages of requests to HTML documents also increases more than linear. Due to clearly observable trends, linear regression is employed for determining the forecast for popularity index and temporal correlation for each document class. That is the forecasted value y3 is derived from the observed values y1 and y2 by y3 = y2 + ( y2 y1 ). The forecasted percentages of requests is derived using a linear logarithmic regression for representing exponential growth of Web users. That is the forecasted percentage values y3 is derived by

-181996 DEC 15.69 Number of requests (%) 6.84 Mean document size (KB) 3.96 Variance of document size 0.79 Popularity index, 0.42 Temporal correlation, 82.24 Image Number of requests (%) 10.22 Mean document size (KB) 7.66 Variance of document size 0.75 Popularity index, 0.50 Temporal correlation, 2.07 Multi Media Number of requests (%) 189.71 Mean document size (KB) Variance of document size 25,003 0.78 Popularity index, 0.65 Temporal correlation, HTML 1998 2000 Univ. Sask. Univ. Do. 19.72 8.96 0.78 ?? 77.45 5.63 0.76 ?? 2.81 49.15 0.75 ?? 24.15 7.49 0.76 0.51 72.03 8.31 0.78 0.54 3.82 124.81 0.73 0.78 2002 Forecast 28.47 7.76 5.39 0.74 0.56 66.35 8.05 9.95 0.80 0.56 5.17 121.22 19,549 0.71 0.85

Table 5. Workload forecast for institutional Web proxy caches

5 4.5 4
log10(reference count)

HTML Image Multi Media

3.5 3 2.5 2 1.5 1 1 1.5 2 2.5 log10(time since last reference) in minutes 3

Figure 6. Breakdown of interreference times by document class in workload forecast for institutional proxy caches

-19-

1998 CANARI Number of requests (%) Mean document size (KB) Variance of document size Popularity index, Temporal correlation, Image Number of requests (%) Mean document size (KB) Variance of document size Popularity index, Temporal correlation, Multi Media Number of requests (%) Mean document size (KB) Variance of document size Popularity index, Temporal correlation, HTML 17.27 10.30 0.60 0.49 80.67 6.25 0.61 0.51 2.05 122.67 0.73 0.58

2000 DFN 20.10 14.29 4.28 0.54 0.65 76.20 6.93 7.10 0.65 0.60 3.70 115.23 43,593 0.70 0.71

2002 Forecast 22.61 12.29 2.83 0.48 0.81 70.73 6.59 7.47 0.69 0.69 6.66 118.95 42,943 0.67 0.84

Table 6. Workload forecast for backbone Web proxy caches

5 4.5 4
log10(reference count)

HTML Image Multi Media

3.5 3 2.5 2 1.5 1 1 1.5 2 2.5 log10(time since last reference) in minutes 3

Figure 7. Breakdown of interreference times by document class in workload forecast for backbone proxy caches

-20ln y3 = ln y2 + (ln y2 ln y1 ) . Subsequently, the forecasted percentage values are normalized, so that they sum up to 100%. Since for all three classes of Web documents no trends can be observed, the forecasts for the mean document sizes are determined using the method of moving average, that is y3 = (y1 + y2). Subsequently, based on these characteristics, the reference stream of the DEC trace is altered and the corresponding variances are determined based on the modified trace data. Figure 6 plots the interreference times for individual document classes for the workload forecast derived in this way. The different slopes describe how the temporal locality in the request streams have been modified. Using the same kind of data of the CANARIE and DFN traces, we can derive with the same regression methods a workload forecast for proxy caches located in backbone networks. Table 6 presents the corresponding characteristics for HTML, image, and multi media documents. The log/log plot of the interreference times for individual document classes is shown in Figure 7.

4.3 Deriving the Workload Forecasts from the DEC and DFN Traces
To analyze the performance of Web replacement schemes for the workload forecasts introduced in Section 4.2, synthetic workloads rather than measured data are needed as input for our simulator. These synthetic traces are based on current traces: Forecast 1 is based on DEC trace, Forecast 2 on DFN trace, respectively. To obtain the characteristics specified in Tables 5 and 6, we need to modify the number of requests to the different document classes, the distribution of document sizes, and the document popularity and temporal correlation in the reference stream. Timestamps of incoming requests are kept from original traces, avoiding the need for a reasonable model of interarrival times. The mean document sizes specified in Table 5 and 6 can be achieved by individual scaling document sizes for each document class by a constant factor. Number of requests to each class can be achieved by randomly selecting the document class for an outstanding request according to the probability distribution specified by the fraction of requests to this class. It remains to show how the documents within a class can be referenced according to the distributions of the two sources of temporal locality specified by an . The independent reference model [3], [13] considers only one source of temporal locality, i.e., the popularity. The independent reference model chooses a document with popularity rank with probability P ~ As stated in [15], temporal locality resulting from short term temporal correlations is important for performance of small caches. To generate a request stream considering both sources of temporal locality, we used the method described below.

-21To keep correlations between document size and request count [16], for the DEC and DFN traces we number all documents d of a document class by their popularity rank . We calculate relative frequency N (d ) according to the distribution specified by the popularity index . That is N ( d ) = R . Here, R is the number of references to the most popular document scaled according to the overall reference count for the document class in the workload forecast. The documents are divided into (possibly empty) popularity classes C j of equally popular documents, that is C j = d | N (d ) = j . To generate a request, we select a popularity class C with probability P(C = C j ) = C j j . Here C j denotes the cardinality of popularity class C j . Among the documents d C , we choose a document d with P( d = d ) ~ tci ( d ) . Here, tci ( d ) is the number of references to documents in class Ci since the last reference to d . The correctness of this algorithm can be verified by plotting the distribution functions of popularity and temporal correlation on a log/log scale and fitting the slopes and by a least square fit. Empirical tests show that the resulting request stream yields the same overall characteristics, e.g., the fraction of one-timers as reported in [16].

Performance Experiments

5.1 Investigation of the Adaptability of Greedy Dual *

In a first experiment, we evaluate the ability of the GD* replacement scheme to adapt to the actual workload seen at the proxy cache. Under the constant cost model, the optimal case constitutes that for each document class (i.e., HTML, images, and multi media) the fraction of cached documents is equal to the fraction of requests to this document class in the request stream. Figures 7 and 8 plot the fraction of cached image and multi media documents for GD*(1) and LRU. As workload the DEC trace is considered. The cache size is assumed to be 1 GByte. Figure 4 and 5 show that for each document class GD*(1) quickly reaches the optimal fraction of cached documents specified in Table 4 (i.e., 82% images and 2.1% multi media). Opposed to that, in LRU the fraction of cached image documents is smaller (i.e., 40%) and the fraction of multi media documents is substantially larger (i.e., 50%). Similar results have been observed for the DFN trace. These observations explain why GD*(1) achieves high hit rates: GD*(1) does not waste space of the Web cache by keeping large multi media documents that will not be requested again in the near future.

-22-

90 80
Fraction of Cached Documents in %

GD*(1)

70 60 50 40 30 20 10 0 0 0.5 1 Optimal Fraction 1.5 2 2.5 3 Number of Requests in Millions 3.5 4 4.5 LRU

Figure 7. DEC trace: Fraction of Web cache occupied by images for LRU and GD*(1)

90 80
Fraction of Cached Documents in %

Optimal Fraction

70 60 50 40 30 20 10 GD*(1) 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Number of Requests in Millions LRU

Figure 8. DEC trace: Fraction of Web cache occupied by multi media documents for LRU and GD*(1)

-23-

5.2

Performance for Current Workloads

In a second experiment, we provide a comparative study for the Web replacement schemes LRU, LFU-DA, GDS(1), and GD*(1) for current workloads of institutional and backbone Web proxy caches. Other replacement schemes are not considered, since in [5] it has been shown that GDS outperforms these schemes. As performance measures, the hit rate and byte hit rate are considered. In Figures 9 to 12, we plot the hit rate (left) and byte hit rate (right) for increasing cache sizes. Cache sizes are set to 0.05%, 0.10%, 0.20%, 0.50%, 1%, 2%, 5%, 10%, and 20% of overall trace size mentioned in Table 4. Recall that the DEC trace was recorded at an institutional Web proxy cache whereas the DFN trace was recorded HTML
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Images
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Multi Media
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 9. DEC trace: Breakdown of hit rates for different document classes

-24in a backbone network. Thus, the recorded request streams of the DEC and DFN traces contain different degrees of temporal locality. This leads to different maximal achievable hit rates and byte hit rates. For example, for the DEC trace, the maximal achievable hit rate is about 63% for images, while it is only about 27% for the DFN trace. In the following, we relate our observations to the results of [14] in which GD* has been introduced. Consistent with [14], we observe that frequency based replacement schemes outperform recency-based schemes in terms of hit rates. As shown in Figures 9 and 10, GD*(1) outperforms GDS(1) and LFU-DA outperforms LRU in terms of hit rate. This holds for each document class. It is most obvious for images while there are only a small advantage HTML
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Images
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Multi Media
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 10. DFN trace: Breakdown of hit rates for different document classes

-25for HTML and multi media documents. Consistent with [14], we observe that in terms of hit rate LRU and LFU-DA perform worse than GDS(1) and GD*(1). This observation is significant for HTML and image documents because of their small document sizes, while there are only small advantages for large multi media documents. This observation can be explained by the fact that LRU and LFU-DA do not take into account document sizes. Opposed to [14], we do not observe in Figure 13 that GD*(1) achieves competitive performance in terms of byte hit rate. As shown in Figure 9 and 7 for HTML and image documents the byte hit rate achieved by GD*(1) is competitive. However, for multi media documents GD*(1) performs significantly worse in terms of byte hit rate than LRU and LFUDA. Since the byte hit rate for multi media documents dominate the overall byte hit rate, this observation leads to a poor byte hit rate for GD*(1). As novel aspect of our study, Figures 9 and 10 plot curves for the achieved hit rates and byte hit rates broken down by document class. Comparing the curves of Figures 9 and 10 with corresponding curves of Figure 13, illustrate that the overall hit rate is mainly influenced by the overall hit rate for images. The overall byte hit rate is mainly determined by the overall byte hit rate for multi media documents. The first result can be explained by the fact that about 70% of the requests in the DEC and DFN traces are requests for images. The second result is due to the fact that multi media documents determine nearly 50% of the requested data in the DEC and DFN traces.

5.3

Performance for Future Workloads

As a third experiment, we investigate performance of replacement schemes on the workload forecasts specified in Tables 5 and 6. Recall that in these workload forecasts, the mean file sizes for HTML, image, and multi media documents as well as the fraction of requests to individual document classes have been modified. As a consequence, the overall size of the trace representing the forecast for institutional Web proxies is 18.7 GB instead of 19.2 GB of the DEC trace. The overall size of the trace representing the forecast for backbone proxies is 38.5 GB instead of 39.4 GB of the DFN trace. As before, cache sizes are set to 0,05%, 0,10%, 0,20%, 0,50%, 1%, 2%, 5%, 10%, and 20% of overall trace sizes. Consistent with [14], we observe in Figures 11 and 12 that LRU and LFU-DA perform significantly worse than GDS(1) and GD*(1) in terms of hit rate. As already observed for current workloads in Section 5.2, the gap between GD*(1) and LRU is significant for HTML and images while it diminishes for multi media documents. Opposed to [14], Figures 11 and 12 show that the gap between frequency-based and recency- based schemes in terms of hit rates vanishes for the workload forecasts. That is, LRU performs almost as good as LFU-DA and GDS(1) performs even better than or GD*(1). This can be explained by the short term temporal correlation assumed for the future workloads. Increasing the parameter let

-26temporal correlation become a more important factor of temporal locality than document popularity. Recency-based replacement schemes make use of temporal correlation, while frequency-based replacement schemes make use of document popularity. Also opposed to [14], we observe in Figure 13 that the byte hit rate of GD*(1) is not competitive for the workload forecasts. Because of the higher temporal correlation, the byte hit rate of GD*(1) for HTML and images looses ground to other schemes, while byte hit rates on multi media documents stays low resulting from discrimination of large documents. Low byte hit rates on multi media have even larger impact on overall byte hit rates because of the higher fraction of such requests. HTML
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Images
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Multi Media
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 11. Workload forecast for institutional proxy caches: Breakdown of hit rates for different document classes

-27Furthermore, our studies show that the hit rate on HTML and multi media documents increase whereas the hit rates on images decreases. This effect is due to the smaller fraction of requests to images in the workload forecasts. In terms of byte hit rate LRU and GDS(1) perform for the workload forecasts significantly better than for the current workloads. In fact, LRU outperforms the frequency-based schemes LFU-DA and GDS(1) outperforms GD*(1). This is due to the assumption that the temporal correlation specified by the parameter is higher in future workloads than in current workloads. HTML
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Images
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Multi Media
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 12. Workload forecast for backbone proxy caches: Breakdown of hit rates for different document classes

-28DEC Trace
0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Forecast for institutional proxy caches

0.6 0.5
Hit Rate

LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU LFU-DA GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

DFN trace
0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Forecast for backbone proxy caches

0.3 0.25 0.2
Hit Rate

0.3 LRU LFU-DA GDS(1) GD*(1)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU LFU-DA GDS(1) GD*(1)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 13. Overall hit rates and byte hit rates for current and future workloads

-29-

0.6 0.5
Hit Rate

LRU GD(1) GDS(packets) GD(packets)

Byte Hit Rate

0.6 0.5 0.4 0.3 0.2 0.1 0

LRU GD(1) GDS(packets) GD(packets)

0.4 0.3 0.2 0.1 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 14. Performance of GD*(packets) and GDS(packets) for institutional proxy caches As a last experiment we studied the performance of GD* and GDS for the workload forecasts under packet cost model. Figures 14 and 15 compare GD*(packets) and GDS(packets) with GD*(1) and LRU which has shown in Figure 13 best hit rates and byte hit rates, respectively. For our simulation runs, we used the cache sizes as mentioned in Section 4.3. Opposed to [14], we observe that GD*(packets) does not outperform LRU in terms of byte hit rates. The increasing importance of temporal correlation in the request streams of the workload forecasts and the lack of size awareness of LRU close the gap in byte hit rate. As described in [14], large values for put more weight on frequency decisions, which are not suited for workloads with high temporal correlation. Therefore, GD*(packets) looses ground to LRU, which is tailored to capture temporal correlation in form of recency-based replacement decisions. In term of hit rate, GD*(packets) outperforms LRU only for small cache sizes. This is due to taking into account the document popularity in terms of frequency of access to certain Web documents. Figures 14 and 15 also show that for future workloads GDS(packets) achieves the highest byte hit rates (i.e., as high as LRU) while outperforming LRU in terms of hit rate. In an integrated evaluation considering both hit rate and byte hit rate GDS(packets) performs slightly better than GD*(packets).
0.3 0.25 0.2
Hit Rate

0.3 LRU GD(1) GDS(packets) GD(packets)

Byte Hit Rate

0.25 0.2 0.15 0.1 0.05 0

LRU GD(1) GDS(packets) GD(packets)

0.15 0.1 0.05 0 0.01 0.1

Cache Size in GB

0.01

0.1
Cache Size in GB

Figure 15. Performance of GD*(packets) and GDS(packets) for backbone proxy caches

-30-

5.4 Partitioned Web Proxy Caches

As last experiment, we evaluated performance of the replacement schemes LRU and SLRU on a partitioned organization for Web proxy caches. Thus, the cache memory is split up in several partitions of different sizes. Each partition is assigned to a certain class of web documents. E.g., there are three partitions, one is assigned to HTML document, one to images and one to multi media documents. On each partition an autonomous replacement scheme is applied. That is, for the replacement schemes LRU one LRU stack is implemented each for HTML documents, images, and multi media documents. Size of partitions can be determined by the fraction of requests for documents belonging to the class managed by the partition. Partitioned Web caches have several application areas. First, statically partitioned caches can be implemented by running several cache processes on a single machine. As results of our experiments show, this can improve hit rate of traditional replacement schemes as LRU, SLRU and LFU-DA. Second, the individual partitions can be distributed on work stations connected by a local area network, each running a cache process. Small partitions, e.g. the HTML partition, can be managed by small workstations. Large partitions, e.g. the multi media partition can be managed by workstations with sufficient disk space. Partitions with high access frequencies, e.g. the Image partition, can be managed by workstations with high processing power. Figure 16 and 17 show that a partition organization of the cache can improve hit rate of the cache for replacement schemes LRU and SLRU. On the same time, there are small losses in byte hit rate compared to a unpartitioned organization. In the figures, we put the results in relation to the best stable implemented replacement scheme (i.e., GDS in Squid) and to the best known replacement scheme (i.e. GD*). This shows, that partitioned organization makes LRU and SLRU more competitive to GDS and GD* in terms of hit rate. In the same time, byte hit rate of partitioned LRU and SLRU remains superior to the byte hit rate of GDS and GD*.
LRU LRU part. GDS(1) GD*(1) 0.6
Cache Byte Hit Rate

0.6 0.5
Cache Hit Rate

0.5 0.4 0.3 0.2 0.1 0

LRU LRU part. GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0.01 0.1

Cache Size (GB)

0.01

0.1
Cache Size (GB)

Figure 16. Performance of partitioned LRU

-31-

0.6 0.5
Cache Hit Rate

Cache Byte Hit Rate

SLRU SLRU part. GDS(1) GD*(1)

0.6 0.5 0.4 0.3 0.2 0.1 0

SLRU SLRU part. GDS(1) GD*(1)

0.4 0.3 0.2 0.1 0.01 0.1

Cache Size (GB)

0.01

0.1
Cache Size (GB)

Figure 17. Performance of partitioned SLRU For LFU-DA we achieved results comparable to SLRU. It is not reasonable to implement a partitioned organization of a cache managed by the replacement schemes GDS and GD*. Those replacement schemes adapt the fraction of cached documents very well to the number of request for the distinct classes. Our experiments have shown, that partitioned organization of cache will always decrease the performance of GDS and GD*.

Conclusions
This report presented comprehensive performance studies of the Web cache replacement schemes LRU, LFU-DA, Greedy-Dual-Size (GDS) and Greedy-Dual * (GD*). While all commercial Web caching solutions solely rely on LRU, the newly proposed schemes LFU-DA, GDS, and GD* can be used in Squid software Web cache. Opposed to previous studies, we not only consider current workloads based on measured trace data, but also two forecasts for future workloads seen at Web proxy caches. This workload forecasting is motivated by the rapidly increasing number of digital audio (i.e., MP3) and video (i.e., MPEG) documents in the Web. To derive synthetic workloads for the workload forecasts, we introduced an effective method for modifying given trace data so that the modified request stream represents the workload characteristics of the forecast. To understand how Web cache replacement schemes deal with different Web document classes, we presented curves plotting hit rate and byte hit rates broken down for HTML, images, and multi media documents. The investigation of the adaptability of GD*(1) presented in Section 5.1 evidently shows that GD*(1) does not waste cache space by keeping large multi media documents that are likely not to be referenced in the near future. This observation explains why GD*(1) almost always achieves the highest hit rate. The breakdown of hit rates and byte hit rates per document class shows that the overall hit rate is mainly influenced by the hit rate on images. The overall byte hit rate is mainly influenced by the byte hit rate on multi media documents. Recall that current workloads consist of about 70% images and only about 2% multi media documents. As a consequence, GD*(1) performs significantly better that the other schemes in

-32terms of hit rate. For small proxy caches, GD*(1) also stays competitive with LRU and LFUDA in terms of byte hit rate. In an overall evaluation considering both hit rates and byte hit rates, the software Web caching solution Squid with the replacement scheme GD*(1) should be the choice for current workloads. Recall that our workload forecasts are based on the assumption that the fraction of multi media documents significantly increases, the popularity of some multi media documents also increases and that the time between two successive references to the same Web document decreases. These assumptions are motivated by the trends derived from five traces measured in 1996, 1998, and 2000. For future workloads, GDS(1) achieves the same performance as GD*(1) in terms of hit rate. Furthermore, the difference in hit rate achieved by GD*(1) over LRU and LFU-DA became considerably smaller. On the other hand, the disadvantage of GD*(1) over LRU and LFU-DA in terms of byte hit rate clearly became significant. In an overall evaluation considering both hit rates and byte hit rates the software Web caching solution Squid with the replacement scheme GDS(1) should be the choice for future workloads.

References
[1] [2] [3] M. F. Arlitt, R. Friedrich and T. Jin: Performance Evaluation of Web Proxy Cache Replacement Policies, Performance Evaluation, 39, pp. 149-164, 2000. M. F. Arlitt and C. Williamson: Internet Web Servers: Workload Characterization and Performance Implications, IEEE/ACM Trans. on Networking, 5, pp. 631-645, 1997. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications. Proc. 21st Annual Conf. of the IEEE Computer and Communication Societies, (IEEE Infocom) New York, pp. 126-134, 1999. CacheFlow, Inc., http://www.cacheflow.com/ P. Cao and S. Irani, Cost-Aware WWW Proxy Caching Algorithms. Proc. 1st USENIX Symp. on Internet Technologies and Systems, Monterey, California, pp. 193-206, 1997. Cisco Systems, Inc., http://www.cisco.com/ I. Cooper, I. Melve, and G. Tomlinson, Internet Web Replication and Caching http://www.ietf.org/internetTaxonomy, IETF Internet draft,
drafts/draft-ietf-wrec-taxonomy-05.txt

[4] [5] [6] [7]

[8]

M. Crovella, Performance Characteristics of the World Wide Web, In: G. Haring, C. Lindemann, M. Reiser (Eds.) Performance Evaluation: Origins and Directions, LNCS Vol. 1769, pp. 219-232, Springer 2000. D. L. Eager, M. C. Ferris, and M. K. Vernon. Optimized Caching in Systems with Heterogeneous Client Populations, Performance Evaluation, 42, pp. 163-185, 2000.

[9]

-33[10] C. Grimm, H. Pralle and J. Vckler, The DFN Cache Mesh,

http://www.cache.dfn.de/

[11] Inctomi, http://www.inktomi.com/ [12] InfoLibria, http://www.infolibria.com/ [13] S. Jin and A. Bestavros: Temporal Locality in Web Request Streams: Sources, Characteristics, and Caching Implications, Technical Report 1999-014, CS Department, Boston University, October 1999. [14] S. Jin and A. Bestavros: Greedy Dual* Web Caching Algorithm: Exploiting the Two Sources of Temporal Locality in Web Request Streams, Proc. 5th Int. Workshop on Web Caching and Content Delivery, Lisboa, Portugal, 2000. [15] A. Mahanti, D. Eager, and C. Williamson. Temporal Locality and its Impact on Web Proxy Cache Performance, Performance Evaluation, 42, pp. 187-203, 2000. [16] A. Mahanti and C. Williamson, Web Proxy Workload Characterization, Technical Report, Department of Computer Science, University of Saskatchewan, February 1999,
http://www.cs.usask.ca/faculty/carey/papers/workloadsudy.ps

[17] K. Mehlhorn and S. Nher, The LEDA Platform of Combinatorial and Geometric Computing, Cambridge University Press, 1999. [18] J. Mogul, Digital's Web Proxy Traces, Digital Equipement Corporation,
ftp://ftp.digital.com/pub/DEC/traces/proxy/tracelistv1.2.HTML

[19] Network Appliance, http://www.netapp.com/ [20] Novell, http://www.novell.com/ [21] H. Schwetman, Object-oriented Simulation Modeling with C++/CSIM17, Proc. of the 1995 Winter Simulation Conference, Eds. C. Alexopoulos, K. Kang, W. Lilegdon, D. Goldsman, 529-533, 1995, http://www.mesquite.com/ [22] The Squid Web Proxy Cache, http://www.squid-cache.org/

-34-

Appendix A: Description of the Simulation Environment A.1 Simulator Design

Our Simulator represents a institutional level or backbone proxy cache. The simulation model consists of three main components, which are shown in Figure A.1. The first component is the client component, which is responsible for reading requests form trace files and passing them to the cache. The client component parses the input file line by line and puts the corresponding request into the request queue of the cache component. The second component is the cache component. It consists of the sub components request queue, request processing unit, type filter and replacement scheme. The cache component receives requests for Web documents form the client component through the request queue. In the current implementation the request queue has a constant length. This is caused by the fact that we assume a constant rate of requests arrivals, which equals the service rate of the request processing unit. In an advanced implementation the simulator could be extended to consider the performance measure end user latency. In this case, arrival rate is not constant and length of the arrival queue would be an important factor for overall performance. The request processing unit takes the next outstanding request form the request queue. It is responsible for passing the request to the other sub modules of the cache component. First, a request is passed to the type filter. The type filter checks content type of the request for statistical reasons. It can be configured to discard request for documents with certain content types. On this way, the simulated cache can be configured to store only certain types of Web documents, e.g., images or multi media documents.
Client Statistic Class

Request

HIT / MISS

Request Queue

Cache

Request Processing

Replacement Scheme

Type Filter

Figure A.1. Software design of the simulator

-35If a request has not been discarded by the type filter, it is passed to the replacement scheme in the second step. The replacement scheme is a generic component which can be specialized to implement several replacement schemes, e.g. the replacement schemes described in Section 3. The generic tasks of the replacement scheme are: Check if a document is stored in the cache. This task must be implemented in the replacement scheme because sophisticated replacement algorithms might use specially designed data structures to keep track of cached objects. Evict documents, if replacement is necessary. This is a natural task of the replacement scheme, because the implemented policy decides, which document is worth keeping and which document can be evicted.

The third component of our simulation model is the statistic class, which keeps track of performance measures, i.e., hit rate and byte hit rate The statistic class stores requested documents and file sizes. It is notified whether a request resulted in a cache hit or cache miss. On every request, the hit rate and byte hit rate counters are update. The simulator can be configured to use multiple instances of the statistic class according to the type of requested documents. E.g., there can be statistic classes for each, HTML, images and multi media documents.

A.2

Implementation Issues

The simulator has been implemented using the object oriented programming language C++. The design focused on extensibility to future replacement schemes. This is a important aspect, because design of replacement schemes is an active area of research. The implementation can also be extended to consider end user latency as performance measure. The components of the simulator and the statistic tables have been implemented using the C++ version of the discrete event simulation library CSIM18 [21]. CSIM offers functionality for virtual parallel execution of processes (e.g. client and cache component), events (e.g. arrival of a request), interprocess communication via mailboxes (used e.g. for arrival queue) and statistical counters, which basic functionality as mean values, standard deviation, variance and confidence intervals. For the effective implementation of sophisticated replacement schemes such as Greedy Dual * highly efficient data structures are needed. Thus, GD* maintains documents in a priority queue. This queue has typically about 1,500,000 entries. To achieve reasonable simulation times, efficient query and update of data structures must be provided. For this reason, data structures from the Library of Efficient Data Structures and Algorithms (LEDA) [17] are adopted. LEDA offers highly efficient algorithms and data structures in a object orientated template library. All algorithms are state-of-the-art in terms of computational complexity. The usage of LEDA priority queues and hashing tables yields a highly efficient simulation environment.

-36-

A.3

Parameters of a Simulation Run

Simulation experiments are specified by the following parameters. Cache Size: Maximal volume of cached content. Replacement Scheme: One of the replacement schemes LRU, LFU-DA, SLRU, GDS or GD*. For GDS and GD* the options 1 (constant cost) or packets (packet cost) can be selected. For detailed description of the replacement schemes see Section 3. Warm up: Specifies the fraction requests which should be used to fill the cache initially. Warm up has big influence on performance measures hit rate and byte hit rate. If warm up is not sufficient long to fill the cache, a huge fraction of requests are counted as misses (so called cold misses). In our simulation studies we used 20% of the requests as warm up, which filled the cache for all examined cache sizes. Calculation of performance measures starts after warm up phase. Content Type: The type of documents which should be stored in cache for the actual simulation run. Valid content types are all, HTML, images, multi media. Trace File: File name of input trace file. All parameters can be adjusted form the command line when starting the simulator.

-37-

Appendix B: Trace Format B.1 DEC Trace

The DEC trace files are stored in a compact binary format. An entry of the DEC trace files consists of 17 fields. Each entry represents an unique request. The file format of the DEC Trace is shown in Table B.1. Field 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Type Meaning u_4bytes Request duration in ms u_4bytes Server duration in ms u_4bytes Last modified time stamp as specified u_4bytes Time of request, lower bytes u_4bytes Time of request, upper bytes u_4bytes Client u_4bytes Server u_4bytes TCP port u_4bytes Path on server u_4bytes Query u_4bytes Size of requested object u_4bytes URL unsigned short Status Code unsigned char Content Type unsigned char Flags method_t HTTP Method protocol_t Protocol Table B.1. File format of DEC trace

The request duration is the time from when the proxy first accepts a connection from the client to when the proxy successfully closes that connection. The server duration field consists of the time the proxy is connected to the web server. This duration can take the unsigned equivalent of -1 in the case that no connection to the server was made. Both duration fields are measured in microseconds. The time of request fields represent the time at which the proxy accepted a connection from the client, in microseconds since the UNIX epoch. The fields client, server, path, query are all unique ID numbers. These numbers are sequential from 1 to the last unique value for that field. The path is the portion of the requested URL following the server name, up to the end of the URL or the first ?, if one appears. If the URL contains no path, the default path ID for / is used and a flag in the field flags is set. If a ? appears in the URL, the string used for the query is everything after the first ? in the URL. The port is the TCP port used for the connection. The size is the size of the object in bytes. The URL number is a unique number for each distinct (host, path, query) combination. Status Code is the HTTP status code returned from the origin server. The values for the field flags describe the query specified by the user agent. For instance, it specifies if the requested

-38URL contains ? or cgi-bin, or if a file extension is specified. Any extension provided at the end of the path is used to determine a object type as specified in the type field.

B.2

DFN Trace

The DFN trace files are stored in a plain text format. A line of the DFN trace files consists of 13 fields. Each line represents an unique request. The file format of DFN trace files is illustrated in table B.2. Field 1 2 3 4 5 6 7 8 9 10 11 12 13 Type double int ipv4 string int string hex string hex string string string string Meaning Time stamp Duration Client address Status tag / code. Size HTTP Method Server Domain URL Suffix Ident result Hierarchy code MIME Type Table B.2. File Format for DFN trace

The field time stamp contains the time the request was received by the cache, in milliseconds since the UNIX epoch. The duration consists of the time from accepting the TCP connection by the cache until closing the TCP connection to the client. The client address is the Ipv4 addresses of the client, anonymized by a MD5 sum. The status tag / code represents Squids status code and the HTTP status code of the response send to the client. The field size contains the size of the reply send to the client, including the size of the HTTP header. The Server is the anonymized server name as specified in the requested URL. The domain field consists of the target domain as specified in the request. If no domain-extension was found, the value of the field is -. The URL is the anonymized target URL as specified in the request by the client. Suffix is the file extension of the requested file as specified by the user agent. The indent result is -. Hierarchy code is the results of request processing returned by Squid. For DIRECT, the target address is anonymized. MIME Type is the content type of the requested document as specified by the origin server.

B.3

Trace Preprocessing

To achieve independence of different trace file formats, we generate two new files in trace preprocessing. The first is the object file, which contains information about the distinct documents referenced in the original trace. Documents specified in requests containing the substrings ? or cgi-bin in the target URL are omitted. Every line of the object file

-393736 616 519 1428 2093 532 image/jpeg application/x-javascript image/gif image/gif application/x-javascript image/gif .net .net .uk .de .pl 1 1 1 6 1 44

Figure B.1. An example for an object file

contains information about an unique document. To save space, URLs are omitted. A document can be uniquely identified by its line number in the object file, denoted as the document ID. A line of the object file contains information about document size, the MIME type of the document, the top level domain of the documents origin server and the reference count of the document in trace file. The object file is read into main memory at beginning of the simulation. It can be used to look up the characteristics of a document at run time. Some example lines of an object file are shown in figure B.1. The second input file is the request file, which contains every valid request from the original trace file. Requests, which have been responded with status codes other than 200 (OK), 203 (Non Authoritative Information), 206 (Partial Content), 300 (Multiple Choices), 301 (Moved Permanently), 302 (Found), and 304 (Not Modified) are discarded [1], [5], [13]. If all requests to a document are responded with a status 304 (which means we are not able to figure out the correct document size in DFN trace), these requests are also discarded. The request file contains a time stamp for all valid requests, i.e. the time elapsed since the first request in the trace file in milliseconds (for future use). Further entries are the document ID corresponding to the line number of the requested document in object file, and a modification flag. This flag indicates if the file size of the response has changed by less than 5% since last delivery of the document, which means that the document has been modified on the origin server [3]. A request for which the modification flag is set, is counted as a cache miss, regardless if the requested document is found in cache. Some example lines of a request file are shown in Figure B.2. The request file is read by the client component during a simulation run and processed as described in Appendix A.