Вы находитесь на странице: 1из 25

Web Caching

Technical report submitted in partial fulfillment of the course

CS625: Advanced Computer Networks Instructor: Dr. Dheeraj Sanghi


Amit Kumar Manjhi

1 Introduction 2 HTTP 2.1 Expiration Model 2.2 Validation Model 2.3 Origin server dictates what can be cached 2.4 Modification of basic expiration mechanisms 2.5 Cache Revalidation and Reload Controls 2.6 No-Transform Directive 3 General aspects of caching 3.1 What is cacheable? 3.2 Desirable properties of a caching system 3.3 Advantages of caching 3.4 Problems due to caching 4 Caching by whom? 5 Caching architecture 5.1 Hierarchical caching architecture 5.2 Cooperative caching architecture 5.2.1 Internet Cache Protocol (ICP) [2],[3] 5.2.2 Cache Array Routing Protocol (CARP)[7] 5.2.3 Other schemes 5.3 Hybrid caching architecture 6 Cache coherency 6.1 Strong Cache Consistency 6.2 Weak cache consistency 7 Prefetching 8 Caching dynamic content 9 Cache Replacement 10 Issues not discussed in detail 10.1 Replication 11 Conclusion References Appendix

3 3 4 4 5 6 6 7 7 7 7 9 10 10 10 11 12 12 13 14 16 16 16 18 18 19 20 21 22 22 23 24

Web Caching
1 Introduction
The World Wide Web (WWW or web) can be considered as a large distributed information system that provides access to shared objects. It is currently one of the most popular applications running on the Internet and its usage is expected to grow further in future. In just over 5 years, the number of static web pages have grown from a measly 125 million to a staggering 1 billion. The main attraction of the World Wide Web that has led to its exponential growth is that it allows people to access vast amounts of information from geographically distributed sites. In addition, the information can be accessed faster than what is possible by using other means. Also, the WWW has documents that are of diverse nature and so, everyone can find information according to his/her liking. But, this scorching rate of growth has put a heavy load on the Internet communication channels. This situation is likely to continue in the foreseeable future, as more and more information services move onto web. The result of all this is increased access latency for the users. Access latency, which is the time interval between the user giving out a request and its actual completion, could result from many reasons. Servers can get flooded with more requests than they can optimally handle. The network path between the user and the server could become congested due to increased traffic on any or some of the constituent links. Caching popular objects close to the users provides an opportunity to combat this latency by allowing users to fetch data from a nearby cache rather than from a distant server. Caching basically tries to explore the high level of redundancy in data transfer over the Internet (Apparently, the top 10-20% of popular sites contribute to the majority of Internet traffic). The remainder of the paper is organized as follows. Section 2 provides an overview of the relevant portion of the HTTP protocol. HTTP is WWW 's application layer protocol. Section 3 discusses the general aspects of caching. Section 4 provides an overview of who needs to do caching and their reasons for doing so. Section 5 looks at various caching architectures in detail. Section 6 is an overview of the cache coherency mechanisms. Section 7 discusses prefetching - getting user's documents before they actually require it. Section 8 discusses a technique for caching dynamic content. Section 9 looks at some cache replacement policies that have proved to be effective. Section 10 discusses features of caching that are not discussed in detail in this paper. Finally, Section 11 summarizes the paper by identifying the research frontiers in this field. An appendix at the last gives informal definitions of the technical jargon used in the paper.

HTTP (Hyper Text Transfer Protocol) [4] is an application level protocol over which all web traffic flows. It defines how clients (browsers, spiders etc.) request the web servers for web pages and how the web servers transfer web pages to the clients. All HTTP traffic takes place over TCP - a reliable, transport layer protocol. Each HTTP message is either a request or a response. 3

HTTP/1.1 [5], the latest version of HTTP uses persistent TCP connections with pipelining. This makes it better than its previous versions in the following ways: Previous versions of HTTP used non-persistent connections in which separate HTTP connection for each object referenced in the HTML base page was required. Thus, each object suffered from a minimum delay of two round-trip-times. Hence, the minimum total delay in such a situation for accessing a page that referenced ten inline images was twenty RTT. The problem was partially alleviated using multiple parallel connections. TCP slow start further compounded the delay. The delay introduced assumed more importance in view of the average small size of web objects (roughly 4KB). HTTP/1.1 is currently the protocol of choice for web transactions. HTTP/1.1 includes a number of elements intended to make caching work as well as possible. The goal is to eliminate the need to send requests in many cases by using an expiration mechanism, and to minimize the need to send full responses in many other cases by using validations. The basic cache mechanisms in HTTP/1.1 (server-specified expiration times and validators) are implicit directives to caches. In addition, the server or client use Cache-Control header when they want to provide explicit directives to the HTTP caches.

2.1 Expiration Model

HTTP caching would work best if every document had a freshness time during which it could be used by the cache to satisfy user s requests. There are basically two models of specifying expirations: Server-specified expiration (specified by use of expires header or the max-age feature of cache control header. Fallout of this is that the server can force the cache to revalidate every request by assigning its documents an explicit expiration time in the past. The server can also do the previous thing by including a must-revalidate cache control directive. ) Heuristic expiration may be used in case an explicit server-specified expiration is absent. The normal heuristic is to assign a freshness time to the object that is proportional to its age. A document s age is the time that the document has remained unchanged (can be calculated using the Last-Modified-Date returned with every response).

2.2 Validation Model

The validation mechanisms include ways in which the client/cache checks the validity of the data stored with it and thus possibly avoids the retransmission of the full response (if the data object stored with it is still valid). Since it is desirable that the object in question gets retransmitted in the same response if it has been changed, so, conditional statements are used. For supporting conditional methods, origin servers while sending a full response attach some sort of validator to it, which is kept in the cache entry. When a client wants to check the validity of the document stored on it, it sends a conditional request plus the associated validator to the origin server. The server then checks that validator against the current validator for the entity. If they match, it 4

sends an appropriate message with a special status code and no entity-body. Otherwise, it returns a full response. The validators that are used are: Last-Modified Dates: A cache entry is considered to be valid if the entity has not been modified since the Last-Modified value. Entity Tag Cache Validators: Entity header carry meta information about the requested resource. It is basically a string sent in the Etag response-header field value. It allows more reliable validation in situations where it is inconvenient to store modification dates, where the one-second resolution of HTTP date values is not sufficient, or where the origin server wishes to avoid certain paradoxes that might arise from the use of modification dates. For finer control over the caching aspects, the protocol defines two types of validators: Strong validators (that change with every change in the document) Weak validators (that change only when the semantics of the document changes) A GET request can be made conditional by including the following directives: If-match: A client that has one or more entities previously obtained from the resource can verify that one of those entities is current by including a list of their associated entity tags in the If-Match header field. The purpose of this feature is to allow efficient updates of cached information with a minimum amount of transaction overhead. If-modified-since: If the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a not modified response will be returned without any message-body. If-none-match: It is the exact opposite of If-match condition. In addition to the uses If-match can be put to, this header can be used to prevent a method (e.g. PUT) from inadvertently modifying an existing resource when the client believes that the resource does not exist. If-range: If a client has a partial copy of an entity in its cache, and wishes to have an up-todate copy of the entire entity in its cache, it can use this header with the range header to fetch the missing part of the document. These directives are used in conjunction with the validators to get the desired cache consistency.

2.3 Origin server dictates what can be cached

Cache control directives can be used to specify what is cacheable. The following directives can do this: Private: Indicates that all or part of the response message is intended for a single user and must not be cached by a shared cache. Public: Indicates that any cache may cache the response 5

No-cache: If the no-cache directive does not specify a field-name, then a cache must not use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.

2.4 Modification of basic expiration mechanisms

Using cache control directives, basic expiration mechanisms can also be modified. The corresponding directives are: s-maxage: overrides max-age/expires header for a shared cache. max-age: If set by the client in a request, it indicates that the client is willing to accept a response whose age is no greater than the specified time in seconds. min-fresh: Indicates that the client wants a response that will still be fresh for at least the specified number of seconds.

max-stale: Indicates that the client is willing to accept a response that has exceeded its
expiration time. Current versions of the popular browsers do not allow the users to configure the above parameters. In near future, this should be possible as the user must be allowed to choose between access latency and freshness of the document (i.e. a user could choose to see a somewhat stale document fetched from a nearby cache because he could access it faster.)

2.5 Cache Revalidation and Reload Controls

Sometimes a user agent might want or need to insist that a cache revalidate its cache entry with the origin server (and not just with the next cache along the path to the origin server), or to reload its cache entry from the origin server. The directives controlling this are: max-age: This directive when having a value zero causes the requested object to be fetched from the origin server. Pragma: no-cache: Similar to max-age=0. only-if-cached: The client uses this when it wants a cache to return only those responses it currently has stored, and not to reload or revalidate with the origin server. A client may opt for this header in times of poor network connectivity or, when it wants the document to access the document only if it could be retrieved soon. must-revalidate: This is a fail-safe mechanism for the origin server, when it requires its document to be revalidated whenever it is used after its explicit expiration. A cache may be configured to ignore a server's specified expiration time and a client request may include a max-stale directive (which has a similar effect) and hence the previous two mechanisms are inadequate for the purpose. 6

proxy-revalidate: The proxy-revalidate directive has the same meaning as the mustrevalidate directive, except that it does not apply to non-shared user agent caches. This feature can be used to store a response to an authenticated request (i.e. the user does not need to authenticate himself twice to get the same response).

2.6 No-Transform Directive

Implementers of intermediate caches (proxies) have found it useful to convert the media type of certain entity bodies to reduce the bandwidth consumption. Serious operational problems occur, however, when these transformations are applied to entity bodies intended for certain kinds of applications like medical imaging, scientific data analysis. In these situations, this directive finds its use.

3 General aspects of caching

In this section, we focus on general aspects of caching like what objects are cacheable, the desirable characteristics of a caching system, the pros and cons of using such a system.

3.1 What is cacheable?

Normally, the objects that have a long freshness time, are small in size and have high access latency (when fetched from the origin server) are the objects best suited for caching. Thus, the following objects cannot be normally cached and effectively caching them is an important challenge in this field: objects that are password-protected any URLs with /cgi-bin/ or any other pre-configured patterns (for example, if a URL contains ? , it is indicative that the URL is calling a program with the portion after the ? as its argument.) any file exceeding a pre-defined limit. SSL requests, which are tunneled through and not cached.

3.2 Desirable properties of a caching system

Higher the hit rate of a caching system, better is the system. So, we should inspect the causes of cache misses and try to reduce their percentage. Cache misses can be classified into: Compulsory misses: Objects, on being accessed for the first time, result in a cache miss. Such misses can be avoided only by: Prefetching: A speculation has to be made on what documents a user would access in future based on his past usage pattern, or usage patterns of similar users, or usage patterns of users who have accessed the same document. Such documents could then be brought 7

into the cache before the user actually makes a request for them. Then, when the user actually requests for the document, the request can be fulfilled from the cache resulting in less number of compulsory misses. This method is known as prefetching. We look at it in detail in Section 8. Shared caches: If many users share a cache, there is a probability that a document requested by an user for the first time has already been accessed by some other user/users and thus it is available in the cache. Thus, compulsory misses are reduced. Thus, an ideal caching system aiming to reduce compulsory misses should be shared and should have provisions for prefetching. Capacity misses: These refer to cache misses for objects which were present in the cache at some point of time, but had to be replaced due to capacity constraints. Thus, since the amount of memory available to a cache/group of caches is finite, the cache replacement algorithm should be good (i.e. it should replace those objects that have the least probability of being accessed in future). We look at some aspects of it in Section 9. Communication misses: The objects which have changed on the origin server, since it was last fetched and stored in the cache, have to be re-fetched from the origin server. Cache misses falling in this category are called communication misses. Servicing these misses depend on what consistency algorithm (cache coherence) the cache is following. We look at it in detail in Section 6. Unreachable/error: These misses occur when the communication channel between the client and cache is disrupted. Normally, nothing can be done about such misses except that the network has to be made more robust. Even after these measures have been successfully deployed in an optimal way, the cache system would suffer from a significant number of cache misses. In today s environment, a hit rate of 4050% is considered good. Thus, another key design principle in addition to following measures to boost hit rates should be: cache systems should not slow down misses. This means that caching has to be done closer to the end-user and there should not be too many caches in the hierarchy (because having too many caches in a hierarchy would increase the total delay significantly). In addition to the above features, a cache system should have: Robustness: Robustness to a user implies availability of the service. Its main aspects are: A few cache crashes should not bring down the system. There should not be a single point of failure. The caching system should fall back gracefully in case of errors. It should be easy to recover from failures. Transparency: It is desirable that the web caching system be transparent. This would reduce 8

problems for novices who need not know that their browser has to point to a certain proxy server. This is normally done using L4 switches, which trap all traffic destined to port 80. Scalability: Any ideal caching system, if it has to be widely deployed in the Internet of today and future, should be scalable. The caching scheme should scale well along the increasing size and density of the network. Efficiency: The caching system should impose minimal additional burden on the network. This includes both control packets and extra data packets incurred by using a caching system. Adaptivity: The caching system should adapt to the dynamics of changing user demand and network environment. Load Balancing: It is desirable that the caching system distributes the load evenly through the entire network. Stability: The schemes used in Web caching system shouldn t introduce instabilities into the network. Simplicity: Simple schemes are easier to deploy and manage. Thus, we would like an ideal caching system to be simple.

3.3 Advantages of caching

There are several incentives for having a caching system in a network. Web caching reduces bandwidth consumption, thereby decreases network traffic and lessens network congestion. It reduces access latency due to two reasons: Frequently accessed documents are present in one of the nearby caches and thus can be retrieved faster (transmission delay is minimized). Due to the previous reason, network traffic is reduced and the load on origin servers gets reduced. Thus, documents not cached can also be retrieved relatively faster. Web caching reduces the workload of the web server. If the remote server is not available due to remote server s crash or network partitioning, the client can obtain a cached copy at the proxy. Thus, the robustness of the web service is enhanced. It allows information to be distributed more widely at a low cost (as cheaper servers can be installed). A side effect of web caching is that it provides us a chance to analyze an organization s access pattern. 9

3.4 Problems due to caching

A client may be looking at stale data due to lack of proper proxy updating. The access latency may increase in the case of a cache miss due to extra proxy processing. Origin servers want to record the exact number of times their page is viewed since, all their advertisement revenues are proportional to it. So, they may decide not to allow their documents to be cacheable (known as cache busting). A single proxy is a single point of failure. An upper bound has to be placed on the number of users a proxy can serve. This bound can be calculated because the average time required to access a document in presence of a proxy should not be more than the time required otherwise.

4 Caching by whom?
Caching is done at many levels mainly to facilitate faster access of documents and to cut down on Internet costs (which is typically proportional to the bandwidth used). At the lowest levels are the user's browsers. Users frequently tend to browse back and forth among a set of documents using the "BACK" and "FORWARD" buttons. Thus, it makes sense to do caching at this level and so, almost all-popular products - significant examples being Netscape Navigator and Internet Explorer, do it. At the next level, we have proxy servers. Proxy servers are special HTTP servers run by institutions on their firewall machines for security reasons. A proxy server typically processes requests from within a firewall, makes requests to origin servers on behalf of them, intercepts the response and sends the reply back to the clients. Clients of an institution generally have common interests and are therefore likely to have similar access patterns. All clients within the firewall typically share a single proxy server and so, it is an effective place to do caching. At a higher level, we have regional ISPs, who would do caching because they typically have to pay in terms of bandwidth usage. For them, investing in cache servers is a one-time investment that results in significant savings every year due to less bandwidth utilization. Thus it makes economic sense for them to do caching. Further higher in the hierarchy, we have national-level ISPs who can cache for similar reasons. Also, it helps them to reduce access latency that could be significant in fetching documents across trans-oceanic links. Thus, we have a roughly hierarchical structure and hierarchical caching becomes important.

5 Caching architecture
This section discusses how cache proxies should be organized, hierarchical, cooperative (distributed), or hybrid? Larger is the user community serviced by a cache, higher is its hit rate (Section 3.2 Shared Caches). A chain of caches trusting each other may assist each other to increase the hit rate. A caching architecture should provide the paradigm for proxies to cooperate efficiently with each other. 10

5.1 Hierarchical caching architecture

As we noted in section 4, the topology of the Internet is loosely hierarchical and it makes sense to do caching at all levels of the hierarchy. Thus, we could have a tree structure in which every node points to its parent (i.e. if a cache miss occurs at a cache, it forwards the request to its parent. This goes on till we reach the root. The root then contacts the origin server if it is unable to satisfy the request.). When the document is found, either at a cache or at the original server, it travels in the opposite direction, leaving a copy at each of the intermediate caches along its path. In a hierarchical structure, nodes higher in the hierarchy have:


larger user populations higher hit rates This structure is consistent with the present Internet with ISPs at each level. It serves to diffuse the popular documents towards the demand. However, this architecture suffers from the following problems: Each hierarchy level introduces additional delays in processing all requests. A cache hit at a cache server high up in the hierarchy may not prove to be beneficial for the end-user because the cache server may be located far away from the end-user. There is redundancy in the storage of documents as documents are replicated at each level. High level caches may become bottlenecks and have long queuing delays.

5.2 Cooperative caching architecture

Here we have an architecture in which there are caches only at the lowest level and a cache can obtain an object from the neighboring caches. It effectively tackles many of the drawbacks of hierarchical caching. There are several approaches: 5.2.1 Internet Cache Protocol (ICP) [2],[3] ICP is a lightweight protocol for quickly querying neighboring caches (caches that are at a cache-hop of one from a cache) whether they contain a copy of the web document. This protocol can be used in any arrangement of caches. All ICP communication is done over UDP. When a ICP cache can t fulfill a request from its own cache, it Queries all its neighbors Obtains the object from first neighbor to respond with a HIT message, caches a copy and returns a copy to the requesting client. When one cache queries another, there are three possible outcomes: ICP HIT message returned, indicating that the object is present in the cache. ICP miss message returned. No response (indicating overloaded cache server or congestion along the path) If HIT message is absent in all the responses or if time-out occurs, then the cache forwards the request to the parent or the origin server. ICP has the following options for added functionality: The requester can ask the sender to send the object in the reply if it is small enough. 12

The requester can also ask the cache to return information about the RTT of source to the origin server so that it can choose the parent (it is possible that a cache has multiple parents, based on say, URL partitioning) with the least RTT for retrieval of the object. Thus, ICP is a fast way to: Find out which neighboring cache has the object. Relative speed of neighboring caches (allows some form of load balancing). Determine network conditions. Top-level choking can be prevented by, defining complex relationships among caches. Lower level caches should forward requests for only cacheable data. It can also be configured so that for nearby origin servers, neighboring caches are not queried. Squid and Harvest currently use ICP for inter-cache communication. But, ICP also suffers from the following drawbacks: ICP message overhead has to be borne by all requests (This is particularly important in case of miss). There is replication of objects, which wastes disk resources. There is no security arrangement and hence, all communication is prone to any type of attack. The message structure of ICP is not as rich as HTTP/1.1 A cache has no way of querying for an object with max-stale parameter or, min-freshness parameter. A cache can t tell what language is acceptable to it, which file formats does it want. Based on the user agent at the end-user, different documents may be returned. ICP does not support such a query. Mainly to reduce some of these drawbacks, HTCP was introduced. HTCP permits full request and response headers to be used in cache management. It also takes care of security by having an authentication header. 5.2.2 Cache Array Routing Protocol (CARP)[7] CARP divides the URL-space among an array of loosely coupled caches and lets each cache store only the documents that are hashed to it. 13

CARP describes a distributed caching protocol based on: A known membership of a loosely coupled proxies. A hash function for dividing URL space among those proxies. It supports for proxies with different HTTP processing and caching capabilities. The hash value is a function of both the URL and the proxy. Such a hash value is calculated for each proxy and then the request is sent to the proxy having the highest score. If that proxy cache is down, the request is then sent to the proxy with the next highest score and so on. It thus avoids the problem of a very high disruption coefficient (the percentage of documents in the wrong cache after a cache has been removed or added to the existing proxies). The disruption coefficient of this scheme is approximately 1/N where N is the number of caches in the array. This is a vast improvement over the simple hashing scheme (in which the hash value is dependent only on the URL) which has a disruption coefficient of . When a cache gets a request for a particular document, it first looks in its local cache to retrieve the document. If a copy is not present, it fetches the document from the origin server, stores a copy and sends the document back to the client. The queries are done over HTTP so that it can take advantage of rich set of HTTP/1.1 headers. Also, there is less replication of web objects and there is no explosion of ICP messages. But, this scheme is suitable only when all the proxies trust each other completely i.e. are under a common administrative control. 5.2.3 Other schemes In these, metadata about the contents of neighboring caches is maintained so that the overhead of ICP message is not incurred. In Summary Cache [11] and Cache Digest [14], caches inter-exchange messages indicating their content and keep local directories to facilitate finding documents in other caches. For reducing the meta-data size, both these schemes use MD5 signatures of the URLs and Bloom Filters, which is essentially a lossy compression for all cache keys. Both these schemes suffer from the following drawbacks: Additional memory requirement for storing meta-data and additional messages for exchanging this meta-data All data is not represented correctly because: A cache may have a stale copy of the meta-data about its neighbor. There is lossy compression. Due to the above, the following errors have to be tolerated: Suppose A and B share caches, A has a request for URL r that misses in A, 14

False misses: r is cached at B, but A didn t know Effect: lower cache hit ratio. False hits: r is not cached at B, but A thought it is based on its information about B Effect: wasted query messages Stale hits: r is cached at B, but B s copy is stale Effect: wasted query messages Both these schemes differ in the following aspects: Cache Digest has a pull mechanism for disseminating meta-data while Summary Cache follows a push mechanism. Both handle deletions in a different fashion. Summary Cache maintains a special table having counts of references to a bucket, to facilitate deletions from Bloom Filter where as Cache Digest does not do so. Tewari et al [15] have proposed a similar scheme in which directory servers replace upper level caches and contain location hints about the documents kept at every cache. A hierarchical metadata-hierarchy is used to make the distribution of these location hints more efficient and scalable. The scheme also suggests implementing push-based data replication for further performance enhancement over a traditional 3-level cache hierarchy. The performance enhancement recorded was: using only a hint hierarchy (speedup of 1.3-2.3) push-based data replication (additional speedup of 1.12-1.25) The data location service is constructed using a scalable hint hierarchy in which each node tracks the nearest location of each object (hashing of URLs is done to minimize the size of updates). A hint is an (objectId, nodeId) pair where nodeId identifies the closest cache that has a copy of objectId. The hierarchy prunes updates so that updates are propagated only to the affected nodes. The scheme implemented an adaptation of Plaxton's algo. The actual steps followed were: pseudo-random ids is assigned to all nodes (proxies) and objects based on their IP-address and URL respectively. The system then logically constructs a tree for each object in which the root node matches the object in the most bits of any node and lower layers match in successively fewer bits. This approach balances load because each node acts as a leaf/low-level node for many objects (but has relatively few requests for such objects) and acts as a root/high-level node for few objects (more requests for these). Clients/low-level-nodes can choose nearby parents (axiom of choice). 15

Higher level nodes somewhat far from the root. Hint propagation through the hierarchy is done by: Whenever a cache loads a new object (or discards a previously cached one), it informs its parent. The parent sends it to its parents or children or both using limited flooding (nodes only propagate changes relating to the nearest copies of the data). The push scheme was limited to pushing objects whose copies were already present in the system. Like all push-based schemes, this scheme also trades bandwidth for access latency. More aggressive techniques that predict data not cached anywhere in the hierarchy were not deployed (i.e. no external directives about future access such as hoard lists or server hits are used). The scheme consisted of two algorithms: push-on-update: : When an object is modified, the proxies caching the old version of the object are a good list of candidates to reference the new version of the object. Thus, in the implementation, nodes track which objects they have supplied to other caches. When it loads a new version of the object, it forwards the new version to all such nodes. push-shared: This is based on the intuition: if 2 subtrees of a node in the metadata hierarchy access an item, it is likely that many subtrees in the hierarchy will access the item. Thus, when one proxy fetches data from another, it also pushes data to a subset of proxies that share a common metadata ancestor. An obvious optimization when a hit is recorded high up in the hierarchy is to send the object directly back to the client, and then in the background push it to the client's proxy. CRISP [20] follows a central directory approach and has a central mapping service that ties together a number of caches.

5.3 Hybrid caching architecture

As an architecture that offers the best of both worlds, we can have hybrid caching architecture in which each ISP has multiple sibling caches, and they may cooperate with one another or fetch the document from a higher level using ICP.

6 Cache coherency
Current cache coherency mechanisms provide two types of consistency.

6.1 Strong Cache Consistency

There are two approaches to it. 1. Client-validation (Polling-every-time): This approach is also called polling-every-time 16

(PET). The proxy treats cached resources as potentially out-of-date on each hit and sends an if-modified-since header every-time. This approach can lead to many redundant not modifiedresponses by the server if the objects do not change for a long period of time. This is normally the case with most objects that are cached. Hence PET puts excessive load on the network, and is not a feasible scheme that can be employed. 2. Server Invalidation: Upon detecting a resource change the server sends invalidation messages to all clients that have recently accessed and potentially cached the resource. This requires maintaining server-side state. The server has to remember every client that has obtained the document from it. This scheme and ways to make it scalable are discussed in [9]. The overheads of this scheme are Storage needed to keep track of clients CPU overhead: search and update lists Time to send invalidation messages Lease-based invalidation is used to make the scheme scalable. This gives a bound on the client list that has to be maintained by the server. The actual details are: Server attaches leases to replies, promises to send invalidation messages if file changes before lease expires Clients promise to send revalidation request if they want to use the document after the lease has expired Further scalability can be achieved by taking note of the fact that: A server should remember only those clients which frequently access its documents. Following these principles, the authors have implemented a 2-tier scheme that assigns non-zero leases only to clients, which are accessing the document for the second time and to proxies. This is done by tracking conditional gets. In the future, servers could assign variable leases, the lease period of which is governed by: Expected frequency of hits from the client Expected life-time of documents which the server is sending Current length of list of clients which the server is maintaining. If the list has become too long, the server could issue documents with a less lease period. The authors argue that weak consistency methods save network bandwidth mostly at the expense of returning stale documents to the users. Their implementation compares three methods: PET, Invalidation and adaptive TTL (50% TTL value) and concludes that invalidation costs no more than adaptive TTL. PET has a high overhead as compared to Invalidation. Thus, they suggest that in future, strong cache consistency should be provided and such a scheme should be based on Invalidation. 17

6.2 Weak cache consistency

It is guaranteed by adaptive TTL (Time to live): The adaptive TTL (also called Alex protocol) handles the problem by adjusting a document s time-to live based on observations of its lifetime. Adaptive TTL is based on the fact that: if a file has not been modified for a long time, it tends to stay unchanged. Thus, time-to-live attribute to a document is assigned to be a percentage of documents current age , which is the current time minus the last modified time of the document. Studies have shown that adaptive TTL can keep the probability of stale documents within reasonable bounds (<5%) and the requisite percentage should be 50%. Most proxy servers use this mechanism. However, there are several problems with this expiration-based coherence: User must wait for expiration checks to occur even though they are tolerant to the staleness of the requested page. If a user is not satisfied with the staleness of the returned document, they have no choice but to use a Pragma: No-cache request to load the entire document from its home site The mechanism provides no strong guarantee towards document staleness. Users cannot specify the degree of staleness they are willing to tolerate.

7 Prefetching
Here we investigate ways of hiding retrieval latency from the user rather than actually reducing it. Since cache hits can t normally be increased beyond 40-50%, this is an effective way It can result in significant reduction in the average access time at the cost of an increase in the network traffic by similar fraction. This can be particularly profitable over non-shared (dialup) links and high bandwidth (satellite) links. Users usually browse the web by following hyperlinks present on a page. Most of these links refer to pages stored in the same server. Typically there is a pause after each page is loaded during which the user reads the web page. The persistent TCP connection that has been opened to fetch the currently displayed object, could now be utilized to get some of the pages referred by this page, residing on the same server. In the proposal, the server computes the likelihood that a particular web page will be accessed next and conveys the information to the clients. The client program/cache then decides whether of not to prefetch the page. The prediction is done by a prediction by Prediction by Partial Match (PPM) modal. A dependency graph is constructed that depicts the pattern of accesses to different files stored at the server. The graph has a node for every file that has ever been accessed. Another scheme that utilizes server s global knowledge of the usage pattern is Geographical Push Caching. In this, using its global knowledge of the access patterns and a derived network 18

topology, the server sends the frequently accessed documents to the caches that are closest to its clients. This scheme reduces the latency due to long transmission times. The study focuses on deriving reasonably accurate network topology information and using the information to select caches.

8 Caching dynamic content

As we mentioned in Section 3.1, one of the main challenges in this field is that of caching dynamic content. Such documents are beginning to form an increasingly higher percentage of total web traffic. Studies analyzing the web traffic have shown that almost: 1/3rd of the requests carry cookies. Cookies are widely used to store user s preferences and past choices on their machines. Cookies also find their usage in shopping carts software. Different advertising companies(who manage advertisement on behalf of many companies), to ensure that a particular user does not get to see the same advertisement more than once, also use it. 23% of requests carry ? (queries). These URLs refer to a program whose parameters are contained in the part of URL after the ?. Many web-servers return webpages based on the information supplied in the HTTP request (like the client s user agent, language in which he is willing to accept the document, The current approach by web servers, who do not want their documents to be cached, is to use HTTP headers to indicate that their document is not cacheable (cache busting). This can be easily done using HTTP/1.1 headers described in Section 2.3. There are two approaches to solve this problem, depending on whether the CGI program takes long to run or whether the output transmission time is high. Thus, there are two approaches: It is a good idea to cache the response of a CGI program [18] if the CGI program takes too long to run. This is based on the heuristics that many CGI programs are invoked with the same arguments. In this paradigm, only weak cache consistency can be guaranteed. Not much can be done about CGI outputs that are different for different users. Caching all possible responses would increase the size of caches significantly. If the network transmission time is high, schemes that can shift the processing of programs that generate the web pages to the server could be employed. Any such scheme would have to force these constraints: The cache server has to invoke the program on every cache hit. It should return a cached document only if the program s output indicates it to do so. The cache server would also have the option of directing the request to the origin server or, to a different cache, if it feels that it does not have the necessary resources to run the program. The supplied executable should run in a platform independent environment. This is 19

because the cache server s machine may be of architecture, which is different from the origin server s architecture. It should be able to contact the origin server since most such programs, for their output rely heavily on source files available at the origin server. One such scheme is active cache [10]. Active cache supports caching of dynamic documents at web proxies by allowing servers to supply cache applets to be attached to documents, and requiring proxies to invoke cache applets upon cache hitting. This allows resource management flexibility at the proxies (i.e. The cache can either process the request itself if it has sufficient resources, or choose to send the request to the origin server if it feels that running the applet is consuming too much resources). The applets are written in Java for practical reasons JVM provides a platform-independent environment and Java security features can also be used. This scheme thus can result in significant network bandwidth saving at the expense of CPU cost. The applet can create a special log object, which encapsulates information to be returned to the user. Using this, the applet can do authentication, log user accesses, create client specific pages or rotate ad banners. An obvious disadvantage of this scheme is that it can increase the user latency if the execution environment at the cache server is slow and the applet has to access files on the origin server. Another approach in this area is using accelerator [1]. We do not discuss this in detail here. It basically resides in front of one or more web servers to speed up user accesses. It provides an API, which allows application programs to explicitly add, delete and update cache data.

9 Cache Replacement
A cache, no matter how big, would get filled after a finite amount of time. Then, a choice has to be made about which document should be evicted. This algorithm is a key aspect in determining a cache s performance. A number of cache replacement algorithms have been proposed, which attempt to minimize various cost metrics, such as hit rate, byte hit rate, average latency, and total cost. They can be classified into following three categories. Traditional replacement policies and its direct extensions: Least recently used (LRU) evicts the object that was requested least recently. Least frequently used (LFU) evicts the object that was requested least frequently Key-based replacement policies: (i.e. though replacement policies in this categories evict objects based upon a primary key. Ties are broken based on secondary key, tertiary key etc.) Size evicts the largest object. LRU-MIN is biased in favor of smaller objects. If there are many objects in the cache 20

which have size being at least S, LRU-MIN evicts the least recently used such object from the cache. If there are no object of size being at least S, then LRU-MIN starts evicting objects in LRU order of size being at least S/2. That is, the object which has the largest log (size) and is the least recently used object among all objects with the same log(size) whill be evicted first. LRU-Threshold is the same as LRU but the objects larger than certain thresholds are never cached. Lowest Latency First minimizes average latency by evicting the document with the lowest download latency first. Cost-based replacement policies: (i.e. the replacement policies in this categories employ a potential cost function derived from various factors such as time since last access, entry time of the object in the cache, transfer time cost, object expiration time and so on) Greedy Dual Size (GD-Size) [8] associates a cost with each object and evicts objects with least cost/size. This is a generalization of the LRU algorithm to the case where each object has a different fetch cost. The motivation behind the GD-Size scheme is that objects with large fetch costs should stay in the cache for longer time. The algorithm maintains a value for each object that is currently stored in the cache. When an object is fetched into the cache, its value is set to its fetch cost. When a cache miss occurs, the object with the minimum value is evicted from the cache, and the values of all other objects in the cache are reduced by this minimum value. And if an object in the cache is accessed, then its value is restored to its fetch cost. A further implementation optimization is to note that it is only the relative value that matters in this algorithm. So, instead of deleting a fixed quantity from the value of each cached entry, the fixed quantity could be added to the value of the new object, and the effect would remain the same. Hierarchical Greedy Dual (Hierarchical GD) [17] does object placement and replacement cooperatively in a hierarchy. Cooperative placement helps to utilize a nearby idle cache and the hit rates in the cache hierarchy are increased by placement of unique objects. It is the same as GD-size with the added advantage of cooperative caching. In this scheme, when an object is evicted from one of the child clusters, it is sent to its parent cluster. The parent first checks whether it has another copy of the object among its caches. If not, it picks the minimum valued object among all its cached objects. Out of the two objects, it retains the one that was used more recently. It propagates the other object recursively to its parent.

10 Issues not discussed in detail


10.1 Replication
Replication is a technique where more than one server can fulfill an HTTP request. A carbon copy of the data present on the origin server is placed on each of the replicas. This is done so that the load on a server gets reduced. When the copy of a document changes on the main server, this change is communicated to all the other replica servers so that consistency is maintained across the servers. [16] presents a new approach to web replication where each of the replicas reside in different parts of the network. This is better than replication architectures involving a cluster of servers that reside at the same site (such architectures do not address the performance and availability problems in the network and serve only to share the load between different replicas). The suggested architecture includes three alternatives to automatically direct the user s browser to the best replica. The HTTP redirect method: The method is implemented using web server-side programming at the application level. The implementation uses HTTP redirect facility. It is a simple method to redirect a connecting client to the best overall server and cannot take into dynamic load conditions of the network. This model has a single point of failure as there has to be a central server that redirects all HTTP requests to other servers. Another significant disadvantage is that a user may bookmark one of the children servers rather than the central server. This defeats any optimization that is applied at the central server. The DNS (Domain Name Server) round-trip method: This method is implemented at the DNS level, using the standard properties of the DNS. It uses DNS to determine which server is closest to the client. The authors suggest a change in the DNS: different name servers return different authoritative IP addresses as a reply to the same query. Thus, the web server to whom the client is redirected depends on which web server it queried. It is assumed that clients will choose a name server closest to them (by comparing the RTTs) and the name server would carry the IP address of a client closest to it. The shared IP address method: This method is implemented at the network routing level, using the standard Internet routing. Multiple servers can share an IP address and the routers have to be intelligent enough to redirect an IP packet destined for a particular address to the closest server. The above methods present a new approach to implement web replication where each of the replicas resides in a different part of the network. Actually implementing the second and third method may prove to be difficult since they require major changes to the DNS and IP routing. However, if these methods are employed, significant benefits, both for the client (faster response time, higher availability) and for the network (lower overall traffic) will result.

11 Conclusion
In this report, we gave an overview of fundamental issues and problems in web caching. We also provided an overview of some recent Web caching schemes. In the near future, the WWW is 22

expected to maintaining its current exponential growth rate. The underlying networks over which it operates are becoming more and more diverse. A single caching policy is not the best suited for all such environments. Also the caching policies have to be dynamic so that they can automatically adapt to the changing configuration of the network. This remains a challenging area in this field. Also, more work has to be done so that dynamic content can be cached more effectively. Another related area that needs to be looked at is caching multimedia files. In future, these files would form a substantial portion of the net traffic. Currently, this traffic is not sent over HTTP and ways have to be found to cache such files effectively. Finally, the age-old questions of issues like better cache replacement policies, better caching architecture are still pertinent today and efforts have to be made to improve these techniques in general and also finetune these policies to suit different network environments.

[1] Jia Wang, A survey of Web Caching Schemes for the Internet [2] Internet Caching Protocol ICP, version 2, RFC 2186 [3] Application of ICP, RFC 2187 [4] Hypertext Transfer Protocol HTTP/1.0, RFC 1945 [5] Hypertext Transfer Protocol HTTP/1.1, RFC 2616 [6] Hypertext Caching Protocol HTCP/0.0, RFC 2756 [7] V. Valloppillil and K. W. Ross, Cache array routing protocol v1.0, Internet Draft<draftvinod-carp-v1-03.txt> [8] P. Cao and S. Irani, Cost aware WWW-proxy caching algorithms [9] P. Cao and C. Liu, Maintaining strong consistency in the World Wide Web [10] P. Cao, J. Zhang, and K. Beach, Active Cache: Caching dynamic contents on the web [11] L. Fan, P. Cao, J. Almeida and A.Z. Broder, Summary Cache: a scalable wide-area Web cache sharing protocol [12] Internet draft, Internet wide replication and caching taxonomy [13] V. N. Padamanabhan and J.C. Moghul, Using predictive prefetching to improve World Wide Web latency [14] A. Rousskov and D. Wessels, Cache Digests [15] R. Tewari, M. Dahlin, H. Vin and J. Kay, Beyond hierarchies: design considerations for distributed caching on the Internet [16] Yair Amir, Alec Peterson and David Shaw, Seamlessly selecting the best copy from 23

Internet-wide Replicated Web Servers [17] M. R. Korupolu and M. Dahlin, Coordinated Placement and Replacement for large scale distributed Caches [18] V. Holmedahl, B. Smith, T. Yang, Cooperative Caching of Dynamic Content on a Distributed Web Server [19] J. Gwetzman and M. Seltzer, An analysis of Geographical Push Caching [20] S.Gadde, M. Rabinovich and J. Chase, Reduce, reuse, recycle: an approach to building large internet caches.

Access latency: Time taken to access an object. Byte hit probability: Hits weighed by the size of an object. Cache applet: Mobile code that is attached to a Web page. It runs in a platform independent environment so that it can be executed both at the server and the client. Cache busting: Techniques used by origin servers (e.g. pragma: no-cache or cache control: maxage=0) in the responses it sends, so that caches do not store the responses. Cache coherency (consistency): Maintaining data consistency (i.e. finding out whether a cache entry is an equivalent copy of an entity). Cache server (or, caches): An intermediary server that stores responses from origin servers. It acts both as a server (to the web browsers that point to it) and as a client (while making requests to origin servers or caches higher in the hierarchy). Cacheable: A response is cacheable if a cache is allowed to store a copy of the response message for use in answering subsequent requests. Dynamic Content: Content that is produced as an output of CGI script or search result. Entity: The information transferred as the payload of a HTTP request or HTTP response. It consists of metainformation (information about the entity itself) in the form of entity-header fields and content in the form of an entity-body. Explicit expiration time: The time, at which the origin server intends that, a cache without further validation should no longer return an entity. Hit probability: Fraction of requests that can be satisfied by a cache. Heuristic expiration time: An expiration time assigned by a cache when no explicit expiration 24

time is available. Object: A file addressable by a single URL. Origin Server: The server on which a given object resides or is to be created. User Agent: The clients (e.g. web browsers, spiders, or other end user tools) which initiate the request. Proxy server: A special kind of HTTP server run by institutions on their firewall machines. It typically processes requests from within a firewall, makes requests to origin servers on behalf of them, intercepts the reply and sends the reply back to the clients. Server: An application program that accepts connections in order to service requests by sending back responses. Validator: A protocol element (e.g., an entity tag or a Last-Modified time) for enforcing cache coherency.