Академический Документы
Профессиональный Документы
Культура Документы
By
Amit Kumar Manjhi
1
WEB CACHING 3
1 Introduction 3
2 HTTP 3
2.1 Expiration Model 4
2.2 Validation Model 4
2.3 Origin server dictates what can be cached 5
2.4 Modification of basic expiration mechanisms 6
2.5 Cache Revalidation and Reload Controls 6
2.6 No-Transform Directive 7
4 Caching by whom? 10
5 Caching architecture 10
5.1 Hierarchical caching architecture 11
5.2 Cooperative caching architecture 12
5.2.1 Internet Cache Protocol (ICP) [2],[3] 12
5.2.2 Cache Array Routing Protocol (CARP)[7] 13
5.2.3 Other schemes 14
5.3 Hybrid caching architecture 16
6 Cache coherency 16
6.1 Strong Cache Consistency 16
6.2 Weak cache consistency 18
7 Prefetching 18
9 Cache Replacement 20
11 Conclusion 22
References 23
Appendix 24
2
Web Caching
1 Introduction
The World Wide Web (WWW or web) can be considered as a large distributed information
system that provides access to shared objects. It is currently one of the most popular applications
running on the Internet and its usage is expected to grow further in future. In just over 5 years,
the number of static web pages have grown from a measly 125 million to a staggering 1 billion.
The main attraction of the World Wide Web that has led to its exponential growth is that it
allows people to access vast amounts of information from geographically distributed sites. In
addition, the information can be accessed faster than what is possible by using other means.
Also, the WWW has documents that are of diverse nature and so, everyone can find information
according to his/her liking. But, this scorching rate of growth has put a heavy load on the Internet
communication channels. This situation is likely to continue in the foreseeable future, as more
and more information services move onto web. The result of all this is increased access latency
for the users. Access latency, which is the time interval between the user giving out a request and
its actual completion, could result from many reasons. Servers can get flooded with more
requests than they can optimally handle. The network path between the user and the server could
become congested due to increased traffic on any or some of the constituent links.
Caching popular objects close to the users provides an opportunity to combat this latency by
allowing users to fetch data from a nearby cache rather than from a distant server. Caching
basically tries to explore the high level of redundancy in data transfer over the Internet
(Apparently, the top 10-20% of popular sites contribute to the majority of Internet traffic). The
remainder of the paper is organized as follows. Section 2 provides an overview of the relevant
portion of the HTTP protocol. HTTP is WWW 's application layer protocol. Section 3 discusses
the general aspects of caching. Section 4 provides an overview of who needs to do caching and
their reasons for doing so. Section 5 looks at various caching architectures in detail. Section 6 is
an overview of the cache coherency mechanisms. Section 7 discusses prefetching - getting user's
documents before they actually require it. Section 8 discusses a technique for caching dynamic
content. Section 9 looks at some cache replacement policies that have proved to be effective.
Section 10 discusses features of caching that are not discussed in detail in this paper. Finally,
Section 11 summarizes the paper by identifying the research frontiers in this field. An appendix
at the last gives informal definitions of the technical jargon used in the paper.
2 HTTP
HTTP (Hyper Text Transfer Protocol) [4] is an application level protocol over which all web
traffic flows. It defines how clients (browsers, spiders etc.) request the web servers for web pages
and how the web servers transfer web pages to the clients. All HTTP traffic takes place over TCP
- a reliable, transport layer protocol. Each HTTP message is either a request or a response.
3
HTTP/1.1 [5], the latest version of HTTP uses persistent TCP connections with pipelining. This
makes it better than its previous versions in the following ways:
• TCP slow start further compounded the delay. The delay introduced assumed more
importance in view of the average small size of web objects (roughly 4KB).
HTTP/1.1 is currently the protocol of choice for web transactions. HTTP/1.1 includes a number
of elements intended to make caching work as well as possible. The goal is to eliminate the need
to send requests in many cases by using an expiration mechanism, and to minimize the need to
send full responses in many other cases by using validations. The basic cache mechanisms in
HTTP/1.1 (server-specified expiration times and validators) are implicit directives to caches. In
addition, the server or client use Cache-Control header when they want to provide explicit
directives to the HTTP caches.
4
sends an appropriate message with a special status code and no entity-body. Otherwise, it returns
a full response. The validators that are used are:
• Last-Modified Dates: A cache entry is considered to be valid if the entity has not been
modified since the Last-Modified value.
• Entity Tag Cache Validators: Entity header carry meta information about the requested
resource. It is basically a string sent in the Etag response-header field value. It allows more
reliable validation in situations where it is inconvenient to store modification dates, where
the one-second resolution of HTTP date values is not sufficient, or where the origin server
wishes to avoid certain paradoxes that might arise from the use of modification dates.
For finer control over the caching aspects, the protocol defines two types of validators:
• Weak validators (that change only when the semantics of the document changes)
• If-match: A client that has one or more entities previously obtained from the resource can
verify that one of those entities is current by including a list of their associated entity tags in
the If-Match header field. The purpose of this feature is to allow efficient updates of cached
information with a minimum amount of transaction overhead.
• If-modified-since: If the requested variant has not been modified since the time specified in
this field, an entity will not be returned from the server; instead, a “not modified” response
will be returned without any message-body.
• If-none-match: It is the exact opposite of If-match condition. In addition to the uses If-match
can be put to, this header can be used to prevent a method (e.g. PUT) from inadvertently
modifying an existing resource when the client believes that the resource does not exist.
• If-range: If a client has a partial copy of an entity in its cache, and wishes to have an up-to-
date copy of the entire entity in its cache, it can use this header with the range header to fetch
the missing part of the document.
These directives are used in conjunction with the validators to get the desired cache consistency.
• Private: Indicates that all or part of the response message is intended for a single user and
must not be cached by a shared cache.
• max-age: If set by the client in a request, it indicates that the client is willing to accept a
response whose age is no greater than the specified time in seconds.
• min-fresh: Indicates that the client wants a response that will still be fresh for at least the
specified number of seconds.
• max-stale: Indicates that the client is willing to accept a response that has exceeded its
expiration time.
Current versions of the popular browsers do not allow the users to configure the above
parameters. In near future, this should be possible as the user must be allowed to choose between
access latency and freshness of the document (i.e. a user could choose to see a somewhat stale
document fetched from a nearby cache because he could access it faster.)
• max-age: This directive when having a value zero causes the requested object to be fetched
from the origin server.
• only-if-cached: The client uses this when it wants a cache to return only those responses it
currently has stored, and not to reload or revalidate with the origin server. A client may opt
for this header in times of poor network connectivity or, when it wants the document to
access the document only if it could be retrieved soon.
• must-revalidate: This is a fail-safe mechanism for the origin server, when it requires its
document to be revalidated whenever it is used after its explicit expiration. A cache may be
configured to ignore a server's specified expiration time and a client request may include a
max-stale directive (which has a similar effect) and hence the previous two mechanisms are
inadequate for the purpose.
6
• proxy-revalidate: The proxy-revalidate directive has the same meaning as the must-
revalidate directive, except that it does not apply to non-shared user agent caches. This
feature can be used to store a response to an authenticated request (i.e. the user does not need
to authenticate himself twice to get the same response).
• any URLs with /cgi-bin/ or any other pre-configured patterns (for example, if a URL contains
‘?’, it is indicative that the URL is calling a program with the portion after the ‘?’ as its
argument.)
• Compulsory misses: Objects, on being accessed for the first time, result in a cache miss.
Such misses can be avoided only by:
7
into the cache before the user actually makes a request for them. Then, when the user
actually requests for the document, the request can be fulfilled from the cache resulting in
less number of compulsory misses. This method is known as prefetching. We look at it in
detail in Section 8.
• Shared caches: If many users share a cache, there is a probability that a document
requested by an user for the first time has already been accessed by some other user/users
and thus it is available in the cache. Thus, compulsory misses are reduced.
Thus, an ideal caching system aiming to reduce compulsory misses should be shared and should
have provisions for prefetching.
• Capacity misses: These refer to cache misses for objects which were present in the cache at
some point of time, but had to be replaced due to capacity constraints. Thus, since the
amount of memory available to a cache/group of caches is finite, the cache replacement
algorithm should be good (i.e. it should replace those objects that have the least probability
of being accessed in future). We look at some aspects of it in Section 9.
• Communication misses: The objects which have changed on the origin server, since it was
last fetched and stored in the cache, have to be re-fetched from the origin server. Cache
misses falling in this category are called communication misses. Servicing these misses
depend on what consistency algorithm (cache coherence) the cache is following. We look at
it in detail in Section 6.
• Unreachable/error: These misses occur when the communication channel between the
client and cache is disrupted. Normally, nothing can be done about such misses except that
the network has to be made more robust.
Even after these measures have been successfully deployed in an optimal way, the cache system
would suffer from a significant number of cache misses. In today’s environment, a hit rate of 40-
50% is considered good. Thus, another key design principle in addition to following measures to
boost hit rates should be: “cache systems should not slow down misses”. This means that caching
has to be done closer to the end-user and there should not be too many caches in the hierarchy
(because having too many caches in a hierarchy would increase the total delay significantly).
In addition to the above features, a cache system should have:
• Robustness: Robustness to a user implies availability of the service. Its main aspects are:
• A few cache crashes should not bring down the system. There should not be a single
point of failure.
• Transparency: It is desirable that the web caching system be transparent. This would reduce
8
problems for novices who need not know that their browser has to point to a certain proxy
server. This is normally done using L4 switches, which trap all traffic destined to port 80.
• Scalability: Any ideal caching system, if it has to be widely deployed in the Internet of today
and future, should be scalable. The caching scheme should scale well along the increasing
size and density of the network.
• Efficiency: The caching system should impose minimal additional burden on the network.
This includes both control packets and extra data packets incurred by using a caching system.
• Adaptivity: The caching system should adapt to the dynamics of changing user demand and
network environment.
• Load Balancing: It is desirable that the caching system distributes the load evenly through
the entire network.
• Stability: The schemes used in Web caching system shouldn’t introduce instabilities into the
network.
• Simplicity: Simple schemes are easier to deploy and manage. Thus, we would like an ideal
caching system to be simple.
• Web caching reduces bandwidth consumption, thereby decreases network traffic and lessens
network congestion.
♦ Frequently accessed documents are present in one of the nearby caches and thus can be
retrieved faster (transmission delay is minimized).
♦ Due to the previous reason, network traffic is reduced and the load on origin servers gets
reduced. Thus, documents not cached can also be retrieved relatively faster.
• If the remote server is not available due to remote server’s crash or network partitioning, the
client can obtain a cached copy at the proxy. Thus, the robustness of the web service is
enhanced.
• It allows information to be distributed more widely at a low cost (as cheaper servers can be
installed).
• The access latency may increase in the case of a cache miss due to extra proxy processing.
• Origin servers want to record the exact number of times their page is viewed since, all their
advertisement revenues are proportional to it. So, they may decide not to allow their
documents to be cacheable (known as cache busting).
• An upper bound has to be placed on the number of users a proxy can serve. This bound can
be calculated because the average time required to access a document in presence of a proxy
should not be more than the time required otherwise.
4 Caching by whom?
Caching is done at many levels mainly to facilitate faster access of documents and to cut down
on Internet costs (which is typically proportional to the bandwidth used). At the lowest levels are
the user's browsers. Users frequently tend to browse back and forth among a set of documents
using the "BACK" and "FORWARD" buttons. Thus, it makes sense to do caching at this level
and so, almost all-popular products - significant examples being Netscape Navigator and Internet
Explorer, do it. At the next level, we have proxy servers. Proxy servers are special HTTP servers
run by institutions on their firewall machines for security reasons. A proxy server typically
processes requests from within a firewall, makes requests to origin servers on behalf of them,
intercepts the response and sends the reply back to the clients. Clients of an institution generally
have common interests and are therefore likely to have similar access patterns. All clients within
the firewall typically share a single proxy server and so, it is an effective place to do caching. At
a higher level, we have regional ISPs, who would do caching because they typically have to pay
in terms of bandwidth usage. For them, investing in cache servers is a one-time investment that
results in significant savings every year due to less bandwidth utilization. Thus it makes
economic sense for them to do caching. Further higher in the hierarchy, we have national-level
ISPs who can cache for similar reasons. Also, it helps them to reduce access latency that could
be significant in fetching documents across trans-oceanic links. Thus, we have a roughly
hierarchical structure and hierarchical caching becomes important.
5 Caching architecture
This section discusses how cache proxies should be organized, hierarchical, cooperative
(distributed), or hybrid? Larger is the user community serviced by a cache, higher is its hit rate
(Section 3.2 Shared Caches). A chain of caches trusting each other may assist each other to
increase the hit rate. A caching architecture should provide the paradigm for proxies to cooperate
efficiently with each other.
10
5.1 Hierarchical caching architecture
As we noted in section 4, the topology of the Internet is loosely hierarchical and it makes sense
to do caching at all levels of the hierarchy. Thus, we could have a tree structure in which every
node points to its parent (i.e. if a cache miss occurs at a cache, it forwards the request to its
parent. This goes on till we reach the root. The root then contacts the origin server if it is unable
to satisfy the request.). When the document is found, either at a cache or at the original server, it
travels in the opposite direction, leaving a copy at each of the intermediate caches along its path.
In a hierarchical structure, nodes higher in the hierarchy have:
11
• larger user populations
This structure is consistent with the present Internet with ISPs at each level. It serves to diffuse
the popular documents towards the demand. However, this architecture suffers from the
following problems:
• A cache hit at a cache server high up in the hierarchy may not prove to be beneficial for the
end-user because the cache server may be located far away from the end-user.
• There is redundancy in the storage of documents as documents are replicated at each level.
• High level caches may become bottlenecks and have long queuing delays.
• Obtains the object from first neighbor to respond with a HIT message, caches a copy and
returns a copy to the requesting client.
When one cache queries another, there are three possible outcomes:
• ICP HIT message returned, indicating that the object is present in the cache.
If HIT message is absent in all the responses or if time-out occurs, then the cache forwards the
request to the parent or the origin server.
ICP has the following options for added functionality:
• The requester can ask the sender to send the object in the reply if it is small enough.
12
• The requester can also ask the cache to return information about the RTT of source to the
origin server so that it can choose the parent (it is possible that a cache has multiple parents,
based on say, URL partitioning) with the least RTT for retrieval of the object.
Thus, ICP is a fast way to:
• Top-level choking can be prevented by, defining complex relationships among caches.
• Lower level caches should forward requests for only cacheable data.
• It can also be configured so that for nearby origin servers, neighboring caches are not
queried.
Squid and Harvest currently use ICP for inter-cache communication.
But, ICP also suffers from the following drawbacks:
• ICP message overhead has to be borne by all requests (This is particularly important in case
of miss).
• There is no security arrangement and hence, all communication is prone to any type of
attack.
• A cache has no way of querying for an object with max-stale parameter or, min-freshness
parameter.
• A cache can’t tell what language is acceptable to it, which file formats does it want.
• Based on the user agent at the end-user, different documents may be returned. ICP does
not support such a query.
Mainly to reduce some of these drawbacks, HTCP was introduced. HTCP permits full request
and response headers to be used in cache management. It also takes care of security by having an
authentication header.
13
CARP describes a distributed caching protocol based on:
It supports for proxies with different HTTP processing and caching capabilities. The hash value
is a function of both the URL and the proxy. Such a hash value is calculated for each proxy and
then the request is sent to the proxy having the highest score. If that proxy cache is down, the
request is then sent to the proxy with the next highest score and so on. It thus avoids the problem
of a very high disruption coefficient (the percentage of documents in the wrong cache after a
cache has been removed or added to the existing proxies). The disruption coefficient of this
scheme is approximately 1/N where N is the number of caches in the array. This is a vast
improvement over the simple hashing scheme (in which the hash value is dependent only on the
URL) which has a disruption coefficient of ½. When a cache gets a request for a particular
document, it first looks in its local cache to retrieve the document. If a copy is not present, it
fetches the document from the origin server, stores a copy and sends the document back to the
client.
The queries are done over HTTP so that it can take advantage of rich set of HTTP/1.1 headers.
Also, there is less replication of web objects and there is no explosion of ICP messages. But, this
scheme is suitable only when all the proxies trust each other completely – i.e. are under a
common administrative control.
• Additional memory requirement for storing meta-data and additional messages for
exchanging this meta-data
• A cache may have a stale copy of the meta-data about its neighbor.
Suppose A and B share caches, A has a request for URL r that misses in A,
14
• False misses: r is cached at B, but A didn’t know
• False hits: r is not cached at B, but A thought it is based on its information about B
• Cache Digest has a pull mechanism for disseminating meta-data while Summary Cache
follows a push mechanism.
• Both handle deletions in a different fashion. Summary Cache maintains a special table having
counts of references to a bucket, to facilitate deletions from Bloom Filter where as Cache
Digest does not do so.
Tewari et al [15] have proposed a similar scheme in which directory servers replace upper level
caches and contain location hints about the documents kept at every cache. A hierarchical meta-
data-hierarchy is used to make the distribution of these location hints more efficient and scalable.
The scheme also suggests implementing push-based data replication for further performance
enhancement over a traditional 3-level cache hierarchy. The performance enhancement recorded
was:
The data location service is constructed using a scalable hint hierarchy in which each node tracks
the nearest location of each object (hashing of URLs is done to minimize the size of updates). A
hint is an (objectId, nodeId) pair where nodeId identifies the closest cache that has a copy of
objectId. The hierarchy prunes updates so that updates are propagated only to the affected nodes.
The scheme implemented an adaptation of Plaxton's algo. The actual steps followed were:
• pseudo-random ids is assigned to all nodes (proxies) and objects based on their IP-address
and URL respectively.
• The system then logically constructs a tree for each object in which the root node matches the
object in the most bits of any node and lower layers match in successively fewer bits. This
approach balances load because each node acts as a leaf/low-level node for many objects (but
has relatively few requests for such objects) and acts as a root/high-level node for few objects
(more requests for these).
15
• Higher level nodes somewhat far from the root.
• Whenever a cache loads a new object (or discards a previously cached one), it informs its
parent. The parent sends it to its parents or children or both using limited flooding (nodes
only propagate changes relating to the nearest copies of the data).
The push scheme was limited to pushing objects whose copies were already present in the
system. Like all push-based schemes, this scheme also trades bandwidth for access latency. More
aggressive techniques that predict data not cached anywhere in the hierarchy were not deployed
(i.e. no external directives about future access such as hoard lists or server hits are used). The
scheme consisted of two algorithms:
• push-on-update: : When an object is modified, the proxies caching the old version of the
object are a good list of candidates to reference the new version of the object. Thus, in the
implementation, nodes track which objects they have supplied to other caches. When it loads
a new version of the object, it forwards the new version to all such nodes.
• push-shared: This is based on the intuition: “if 2 subtrees of a node in the metadata hierarchy
access an item, it is likely that many subtrees in the hierarchy will access the item”. Thus,
when one proxy fetches data from another, it also pushes data to a subset of proxies that
share a common metadata ancestor.
An obvious optimization when a hit is recorded high up in the hierarchy is to send the object
directly back to the client, and then in the background push it to the client's proxy.
CRISP [20] follows a central directory approach and has a central mapping service that ties
together a number of caches.
6 Cache coherency
Current cache coherency mechanisms provide two types of consistency.
2. Server Invalidation: Upon detecting a resource change the server sends invalidation
messages to all clients that have recently accessed and potentially cached the resource. This
requires maintaining server-side state. The server has to remember every client that has
obtained the document from it. This scheme and ways to make it scalable are discussed in
[9]. The overheads of this scheme are
Lease-based invalidation is used to make the scheme scalable. This gives a bound on the
client list that has to be maintained by the server. The actual details are:
• Server attaches leases to replies, promises to send invalidation messages if file changes
before lease expires
• Clients promise to send revalidation request if they want to use the document after the
lease has expired
Further scalability can be achieved by taking note of the fact that: “A server should
remember only those clients which frequently access its documents.” Following these
principles, the authors have implemented a 2-tier scheme that assigns non-zero leases only to
clients, which are accessing the document for the second time and to proxies. This is done by
tracking conditional gets. In the future, servers could assign variable leases, the lease period
of which is governed by:
• Current length of list of clients which the server is maintaining. If the list has become too
long, the server could issue documents with a less lease period.
The authors argue that weak consistency methods save network bandwidth mostly at the expense
of returning stale documents to the users. Their implementation compares three methods: PET,
Invalidation and adaptive TTL (50% TTL value) and concludes that invalidation costs no more
than adaptive TTL. PET has a high overhead as compared to Invalidation. Thus, they suggest
that in future, strong cache consistency should be provided and such a scheme should be based
on Invalidation.
17
6.2 Weak cache consistency
It is guaranteed by adaptive TTL (Time to live): The adaptive TTL (also called Alex protocol)
handles the problem by adjusting a document’s time-to –live based on observations of its life-
time. Adaptive TTL is based on the fact that:” if a file has not been modified for a long time, it
tends to stay unchanged”. Thus, time-to-live attribute to a document is assigned to be a
percentage of documents current ‘age’, which is the current time minus the last modified time of
the document. Studies have shown that adaptive TTL can keep the probability of stale documents
within reasonable bounds (<5%) and the requisite percentage should be 50%. Most proxy
servers use this mechanism. However, there are several problems with this expiration-based
coherence:
• User must wait for expiration checks to occur even though they are tolerant to the
staleness of the requested page.
• If a user is not satisfied with the staleness of the returned document, they have no
choice but to use a Pragma: No-cache request to load the entire document from its
home site
• Users cannot specify the degree of staleness they are willing to tolerate.
7 Prefetching
Here we investigate ways of hiding retrieval latency from the user rather than actually reducing
it. Since cache hits can’t normally be increased beyond 40-50%, this is an effective way It can
result in significant reduction in the average access time at the cost of an increase in the network
traffic by similar fraction. This can be particularly profitable over non-shared (dial–up) links and
high bandwidth (satellite) links.
Users usually browse the web by following hyperlinks present on a page. Most of these links
refer to pages stored in the same server. Typically there is a pause after each page is loaded
during which the user reads the web page. The persistent TCP connection that has been opened
to fetch the currently displayed object, could now be utilized to get some of the pages referred by
this page, residing on the same server.
In the proposal, the server computes the likelihood that a particular web page will be accessed
next and conveys the information to the clients. The client program/cache then decides whether
of not to prefetch the page.
• 1/3rd of the requests carry cookies. Cookies are widely used to store user’s preferences and
past choices on their machines. Cookies also find their usage in shopping carts software.
Different advertising companies(who manage advertisement on behalf of many companies),
to ensure that a particular user does not get to see the same advertisement more than once,
also use it.
• 23% of requests carry “?” (queries). These URLs refer to a program whose parameters are
contained in the part of URL after the “?”.
• Many web-servers return webpages based on the information supplied in the HTTP request
(like the client’s user agent, language in which he is willing to accept the document,
The current approach by web servers, who do not want their documents to be cached, is to use
HTTP headers to indicate that their document is not cacheable (cache busting). This can be
easily done using HTTP/1.1 headers described in Section 2.3.
There are two approaches to solve this problem, depending on whether the CGI program takes
long to run or whether the output transmission time is high. Thus, there are two approaches:
• It is a good idea to cache the response of a CGI program [18] if the CGI program takes too
long to run. This is based on the heuristics that many CGI programs are invoked with the
same arguments. In this paradigm, only weak cache consistency can be guaranteed. Not
much can be done about CGI outputs that are different for different users. Caching all
possible responses would increase the size of caches significantly.
• If the network transmission time is high, schemes that can shift the processing of programs
that generate the web pages to the server could be employed. Any such scheme would have
to force these constraints:
• The cache server has to invoke the program on every cache hit. It should return a cached
document only if the program’s output indicates it to do so. The cache server would also
have the option of directing the request to the origin server or, to a different cache, if it
feels that it does not have the necessary resources to run the program.
19
because the cache server’s machine may be of architecture, which is different from the
origin server’s architecture.
• It should be able to contact the origin server since most such programs, for their output
rely heavily on source files available at the origin server.
One such scheme is active cache [10]. Active cache supports caching of dynamic documents at
web proxies by allowing servers to supply cache applets to be attached to documents, and
requiring proxies to invoke cache applets upon cache hitting. This allows resource management
flexibility at the proxies (i.e. The cache can either process the request itself if it has sufficient
resources, or choose to send the request to the origin server if it feels that running the applet is
consuming too much resources). The applets are written in Java for practical reasons – JVM
provides a platform-independent environment and Java security features can also be used. This
scheme thus can result in significant network bandwidth saving at the expense of CPU cost. The
applet can create a special “log” object, which encapsulates information to be returned to the
user. Using this, the applet can do authentication, log user accesses, create client specific pages
or rotate ad banners.
An obvious disadvantage of this scheme is that it can increase the user latency if the execution
environment at the cache server is slow and the applet has to access files on the origin server.
Another approach in this area is using accelerator [1]. We do not discuss this in detail here. It
basically resides in front of one or more web servers to speed up user accesses. It provides an
API, which allows application programs to explicitly add, delete and update cache data.
9 Cache Replacement
A cache, no matter how big, would get filled after a finite amount of time. Then, a choice has to
be made about which document should be evicted. This algorithm is a key aspect in determining
a cache’s performance. A number of cache replacement algorithms have been proposed, which
attempt to minimize various cost metrics, such as hit rate, byte hit rate, average latency, and total
cost. They can be classified into following three categories.
♦ Least recently used (LRU) evicts the object that was requested least recently.
♦ Least frequently used (LFU) evicts the object that was requested least frequently
• Key-based replacement policies: (i.e. though replacement policies in this categories evict
objects based upon a primary key. Ties are broken based on secondary key, tertiary key etc.)
♦ LRU-MIN is biased in favor of smaller objects. If there are many objects in the cache
20
which have size being at least S, LRU-MIN evicts the least recently used such object
from the cache. If there are no object of size being at least S, then LRU-MIN starts
evicting objects in LRU order of size being at least S/2. That is, the object which has the
largest log (size) and is the least recently used object among all objects with the same
log(size) whill be evicted first.
♦ LRU-Threshold is the same as LRU but the objects larger than certain thresholds are
never cached.
♦ Lowest Latency First minimizes average latency by evicting the document with the
lowest download latency first.
•Cost-based replacement policies: (i.e. the replacement policies in this categories employ a
potential cost function derived from various factors such as time since last access, entry time of
the object in the cache, transfer time cost, object expiration time and so on)
♦ Greedy Dual Size (GD-Size) [8] associates a cost with each object and evicts objects
with least cost/size. This is a generalization of the LRU algorithm to the case where each
object has a different fetch cost. The motivation behind the GD-Size scheme is that
objects with large fetch costs should stay in the cache for longer time.
The algorithm maintains a value for each object that is currently stored in the cache.
When an object is fetched into the cache, its value is set to its fetch cost. When a cache
miss occurs, the object with the minimum value is evicted from the cache, and the values
of all other objects in the cache are reduced by this minimum value. And if an object in
the cache is accessed, then its value is restored to its fetch cost. A further implementation
optimization is to note that it is only the relative value that matters in this algorithm. So,
instead of deleting a fixed quantity from the value of each cached entry, the fixed
quantity could be added to the value of the new object, and the effect would remain the
same.
♦ Hierarchical Greedy Dual (Hierarchical GD) [17] does object placement and
replacement cooperatively in a hierarchy. Cooperative placement helps to utilize a nearby
idle cache and the hit rates in the cache hierarchy are increased by placement of unique
objects. It is the same as GD-size with the added advantage of cooperative caching. In
this scheme, when an object is evicted from one of the child clusters, it is sent to its
parent cluster. The parent first checks whether it has another copy of the object among its
caches. If not, it picks the minimum valued object among all its cached objects. Out of
the two objects, it retains the one that was used more recently. It propagates the other
object recursively to its parent.
21
10.1 Replication
Replication is a technique where more than one server can fulfill an HTTP request. A carbon
copy of the data present on the origin server is placed on each of the replicas. This is done so that
the load on a server gets reduced. When the copy of a document changes on the main server, this
change is communicated to all the other replica servers so that consistency is maintained across
the servers.
[16] presents a new approach to web replication where each of the replicas reside in different
parts of the network. This is better than replication architectures involving a cluster of servers
that reside at the same site (such architectures do not address the performance and availability
problems in the network and serve only to share the load between different replicas). The
suggested architecture includes three alternatives to automatically direct the user’s browser to the
best replica.
• The HTTP redirect method: The method is implemented using web server-side
programming at the application level. The implementation uses “HTTP redirect” facility. It is
a simple method to redirect a connecting client to the best overall server and cannot take into
dynamic load conditions of the network. This model has a single point of failure – as there
has to be a central server that redirects all HTTP requests to other servers. Another
significant disadvantage is that a user may bookmark one of the children servers – rather than
the central server. This defeats any optimization that is applied at the central server.
• The DNS (Domain Name Server) round-trip method: This method is implemented at the
DNS level, using the standard properties of the DNS. It uses DNS to determine which server
is closest to the client. The authors suggest a change in the DNS: different name servers
return different authoritative IP addresses as a reply to the same query. Thus, the web server
to whom the client is redirected depends on which web server it queried. It is assumed that
clients will choose a name server closest to them (by comparing the RTTs) and the name
server would carry the IP address of a client closest to it.
• The shared IP address method: This method is implemented at the network routing level,
using the standard Internet routing. Multiple servers can share an IP address and the routers
have to be intelligent enough to redirect an IP packet destined for a particular address to the
closest server.
The above methods present a new approach to implement web replication where each of the
replicas resides in a different part of the network. Actually implementing the second and third
method may prove to be difficult since they require major changes to the DNS and IP routing.
However, if these methods are employed, significant benefits, both for the client (faster response
time, higher availability) and for the network (lower overall traffic) will result.
11 Conclusion
In this report, we gave an overview of fundamental issues and problems in web caching. We also
provided an overview of some recent Web caching schemes. In the near future, the WWW is
22
expected to maintaining its current exponential growth rate. The underlying networks over which
it operates are becoming more and more diverse. A single caching policy is not the best suited
for all such environments. Also the caching policies have to be dynamic so that they can
automatically adapt to the changing configuration of the network. This remains a challenging
area in this field. Also, more work has to be done so that dynamic content can be cached more
effectively. Another related area that needs to be looked at is caching multimedia files. In future,
these files would form a substantial portion of the net traffic. Currently, this traffic is not sent
over HTTP and ways have to be found to cache such files effectively. Finally, the age-old
questions of issues like better cache replacement policies, better caching architecture are still
pertinent today and efforts have to be made to improve these techniques in general and also fine-
tune these policies to suit different network environments.
References
[1] Jia Wang, A survey of Web Caching Schemes for the Internet
[2] Internet Caching Protocol –ICP, version 2, RFC 2186
[16] Yair Amir, Alec Peterson and David Shaw, Seamlessly selecting the best copy from
23
Internet-wide Replicated Web Servers
[17] M. R. Korupolu and M. Dahlin, Coordinated Placement and Replacement for large scale
distributed Caches
[18] V. Holmedahl, B. Smith, T. Yang, Cooperative Caching of Dynamic Content on a
Distributed Web Server
[19] J. Gwetzman and M. Seltzer, An analysis of Geographical Push Caching
[20] S.Gadde, M. Rabinovich and J. Chase, Reduce, reuse, recycle: an approach to building large
internet caches.
Appendix
Access latency: Time taken to access an object.
Byte hit probability: Hits weighed by the size of an object.
Cache applet: Mobile code that is attached to a Web page. It runs in a platform independent
environment so that it can be executed both at the server and the client.
Cache busting: Techniques used by origin servers (e.g. pragma: no-cache or cache control: max-
age=0) in the responses it sends, so that caches do not store the responses.
Cache coherency (consistency): Maintaining data consistency (i.e. finding out whether a cache
entry is an equivalent copy of an entity).
Cache server (or, caches): An intermediary server that stores responses from origin servers. It
acts both as a server (to the web browsers that point to it) and as a client (while making requests
to origin servers or caches higher in the hierarchy).
Cacheable: A response is cacheable if a cache is allowed to store a copy of the response message
for use in answering subsequent requests.
Dynamic Content: Content that is produced as an output of CGI script or search result.
Entity: The information transferred as the payload of a HTTP request or HTTP response. It
consists of metainformation (information about the entity itself) in the form of entity-header
fields and content in the form of an entity-body.
Explicit expiration time: The time, at which the origin server intends that, a cache without
further validation should no longer return an entity.
Proxy server: A special kind of HTTP server run by institutions on their firewall machines. It
typically processes requests from within a firewall, makes requests to origin servers on behalf of
them, intercepts the reply and sends the reply back to the clients.
Server: An application program that accepts connections in order to service requests by sending
back responses.
Validator: A protocol element (e.g., an entity tag or a Last-Modified time) for enforcing cache
coherency.
25