Вы находитесь на странице: 1из 28

INTERNET The Internet traces its roots to a project of the U.S.

Department of Defenses then named Advanced Research Projects Agency, or ARPA. The ARPANET project was intended to support DoD(Department of Defense) research on computer networkin . The ARPANET computer network was !aunched in "#$# and %& &ear's end consisted of four computers at four sites runnin four different operatin s&stems. ARPANET rew steadi!&, %ut %ecause it was restricted to DoD(funded or ani)ations and was a research project, it was ne*er !ar e. It wasn't !on %efore other networks were %ein %ui!t, %oth internationa!!& and re iona!!& within the +nited ,tates. The re iona! +.,. networks were often cooperati*e efforts %etween uni*ersities. As one e-amp!e, SURAnet (So theastern Un!vers!ty Research Assoc!at!on Net"or#$ was or ani)ed %& the +ni*ersit& of .ar&!and %e innin in "#/0 and e*entua!!& inc!uded essentia!!& a!! of the major uni*ersities and research institutions in the southeastern +nited ,tates. Another of these networks, %SNET (%omp ter Sc!ence Net"or#$& "as part!a''y f nded (y the U.S. Nat!ona' Sc!ence )o ndat!on (NS)$ to aid scientists at uni*ersities without ARPANET access, !a&in the roundwork for future network de*e!opments that we'!! sa& more a%out in a moment. 1hi!e these other networks were sprin in up, the ARPANET project continued to fund research on networkin . ,e*era! of the most wide!& used Internet protoco!s2 inc!udin the )!'e Transfer Protoco' ()TP$ and S!mp'e *a!' Transfer Protoco' (S*TP$, which under!ie man& of the Internet's fi!e transfer and e(mai! operations, respecti*e!&2were initia!!& de*e!oped under ARPANET. 3ut perhaps most crucia! to the emer ence of the Internet as we know it was the de*e!opment of the T%P+IP (Transm!ss!on %ontro' Protoco'+Internet Protoco'$ communication protoco!. T4P5IP was desi ned to %e used for host(to(host communication %oth within !oca! area networks (that is, networks of computers that are t&pica!!& in c!ose pro-imit& to one another, such as within a %ui!din ) and %etween networks. ARPANET switched from usin an ear!ier protoco! to T4P5IP durin "#/0. At around the same time, an ARPA Internet was created, a!!owin computers on some outside networks such as 4,NET to communicate *ia T4P5IP with computers on the ARPANET. A 6connection7 from %SNET to the ARPA Internet often meant that a modem connection was made from one computer to another for the purpose of sendin a!on an e(mai! messa e. This form of communication was as&nchronous. That is, the e(mai! mi ht %e de!a&ed some time %efore it was actua!!& de!i*ered, which prec!uded (meaning: e-c!uded) interacti*e communication of an& t&pe. In "#/8, the N,9 %e an work on a new network %ased on T%P+IP, ca!!ed NS)NET. :ne of the primar& oa!s of this network was to connect the N,9's new re iona! supercomputin centers. 3ut it was a!so decided that re iona! networks shou!d %e a%!e to connect to NS)NET, so that the NS)NET wou!d pro*ide a %ack%one throu h which other networks cou!d interconnect s&nchronous!&. "

The ori ina! %ack%one operated at on!& ,- #(!t+s, the ma-imum speed of a home dia!(up !ine toda&. 3ut at the time the primar& network traffic was sti!! te-tua!, so this was a reasona%!e startin point. The %ack%one rate was !ater up raded to .., *(!t+s (T.$ !n ./00 and then to 1, *(!t+s (T2$ in "##". 9urthermore, the %ack%one was e-panded to direct!& inc!ude se*era! research networks in addition to the supercomputer centers, makin it that much easier for sites near these research networks to connect to the N,9NET. In "#//, networks in 4anada and 9rance were connected to N,9NET; in each succeedin &ear for the remainin se*en &ears of N,9NET's e-istence, networks from "< or more new countries were added per &ear. NS)NET =uick!& supp!anted (meaning: take p!ace of) ARPANET, which was officia!!& decommissioned in "##<. At this point, N,9NET was at the center of the Internet, that is, the co!!ection of computer networks connected *ia the pu%!ic %ack%one and communicatin across networks usin T4P5IP. This same &ear, commercia! Internet dia!(up access was first offered. In summar&, the Internet is the co!!ection of computers that can communicate with one another usin T4P5IP o*er an open, !o%a! communications network.

3ASI% INTERNET PR4T4%45S A comp ter comm n!cat!on protoco' !s a deta!'ed spec!f!cat!on of ho" comm n!cat!on (et"een t"o comp ters "!'' (e carr!ed o t !n order to serve some p rpose. Protoco! specifies %oth the hi h(!e*e! %eha*ior of software imp!ementin the protoco! and the !ow(!e*e! detai!s such as the specific fie!ds of information that wi!! %e contained in a communication messa e, the order in which these fie!ds wi!! appear, the num%er of %its in each fie!d, and how these %its shou!d %e interpreted. T%P+IP T4P and IP are actua!!& two different protoco!s the reason that the& are often treated as one is that the %u!k of the ser*ices we associate with the Internet2e(mai!, 1e% %rowsin , fi!e down!oads, accessin remote data%ases2are %ui!t on top of %oth the T4P and IP protoco!s. 3ut in rea!it&, on!& one of these protoco!s2IP, the Internet Protoco!2is fundamenta! to the definition of the Internet. ,o we'!! %e in our stud& of Internet protoco!s with IP. A ke& e!ement of IP is the IP address, which is simp!& a >0(%it num%er. At an& i*en moment, each de*ice on the Internet has one or more IP addresses associated with it. IP addresses are norma''y "r!tten as a se6 ence of fo r dec!ma' n m(ers separated (y per!ods (ca''ed 7dots8$& as !n ./9.:.21..--. Each dec!ma' n m(er represents one (yte of the IP address.

The function of IP software is to transfer data from one computer (the source) to another computer (the destination). 1hen an app!ication on the source computer wants to send information to a destination, the app!ication ca!!s IP software on the source machine and pro*ides it with data to %e transferred a!on with an IP address for each of the source and destination computers. The IP software runnin on the source creates a packet, which is a se=uence of %its representin the data to %e transferred a!on with the source and destination IP addresses and some other header information, such as the !en th of the data. If the destination computer is on the same !oca! network as the source, then the IP software wi!! send the packet to the destination direct!& *ia this network. If the destination is on another network, the IP software wi!! send the packet to a gate"ay (R4UTER$, which is a de*ice that is connected to the source computer's network as we!! as to at !east one other network. The atewa& wi!! se!ect a computer on one of the other networks to which it is attached and send the packet on to that computer. This process wi!! continue, with the packet oin throu h perhaps a do)en or more hops, unti! the packet reaches the destination computer. IP software on that computer wi!! recei*e the packet and pass its data up to an app!ication that is waitin for the data. The se=uence of computers that a packet tra*e!s throu h from source to destination is known as its route. A separate protoco! 3;P (3order ;ate"ay Protoco'$ is used to pass network connecti*it& information %etween atewa&s so that each can choose a ood ne-t hop for each packet it recei*es. IP software a!so adds some error detection information (a checksum) to each packet it creates, so that if a packet is corrupted durin transmission, this can usua!!& %e detected %& the recipient. T%P& the Transm!ss!on %ontro' Protoco' , is a hi her(!e*e! protoco! that e-tends IP to pro*ide additiona! functiona!it&, inc!udin re!ia%!e communication %ased on the concept of a connection. A connection is esta%!ished %etween T4P software runnin on two machines %& one of the machines (!et's ca!! it A) sendin a connection(re=uest messa e *ia IP to the other (3). That is, the IP messa e contains a messa e conformin to the T4P protoco! and representin a T4P connection re=uest. If the connection is accepted %& 3, then 3 returns a messa e to A re=uestin a connection in the other direction. If A responds affirmati*e!&, then the connection is esta%!ished. Notice that this means that A and 3 can %oth send messa es to one another at the same time; this is known as fu!! dup!e- communication. 1hen A and 3 are %oth done sendin messa es to one another (and at !east done for the time %ein ), a simi!ar set of three messa es is used to c!ose the connection.


:nce a connection has %een esta%!ished, T4P pro*ides re!ia%!e data transmission %& demandin an acknow!ed ment for each packet it sends *ia IP. Essentia!!&, the software sets a timer after sendin each packet. The T4P software on the recei*in side sends a packet containin an acknow!ed ment for e*er& T4P(%ased packet it recei*es that passes the checksum test. If the T4P software sendin a packet does not recei*e an acknow!ed ment packet %efore its timer e-pires, then it resends the packet and restarts the timer. Another important feature that T4P adds to IP is the concept of a port. The port concept a!!ows T4P to %e used to communicate with man& different app!ications on a machine. 9or e-amp!e, a machine connected to the Internet ma& run a mai! ser*er for users on its !oca! network, a fi!e down!oad ser*er, and a!so a ser*er that a!!ows users to !o in to the machine and e-ecute commands from remote !ocations. As i!!ustrated in 9i ure (which i nores connections and acknow!ed ments for simp!icit&), such a ser*er app!ication wi!! make a ca!! to the T4P software on its s&stem to re=uest that an& incomin T4P connection re=uests that specif& a certain port num%er as part of the T4P5IP messa e %e sent to the app!ication. 9or e-amp!e, a mai! ser*er conformin with ,.TP wi!! t&pica!!& ask T4P to !isten for re=uests to port 08. If at a !ater time an IP messa e is recei*ed %& the machine runnin the mai! ser*er app!ication and that IP messa e contains a T4P messa e with port 08 indicated in its header, then the data contained within the T4P messa e wi!! %e returned to the mai! ser*er app!ication. ,uch an IP messa e cou!d %e enerated %& a mai! c!ient ca!!in on T4P software on another s&stem, as i!!ustrated on the ri ht side of the fi ure.

Thou h the connection %etween port num%ers and app!ications is mana ed indi*idua!!& %& e*er& machine on the Internet, certain %road!& usefu! app!ications (such as e( mai! o*er ,.TP) ha*e had port num%ers assi ned to them %& the Internet Ass!gned N m(ers A thor!ty (IANA$ <IANA=P4RTS>. These port num%ers, in the ran e <@"<0>, can usua!!& %e re=uested on!& %& app!ications that are run %& the s&stem at %oot(up or that are run %& a user with administrati*e permissions on the s&stem. :ther possi%!e port num%ers, from "<0? to $88>8, can enera!!& %e used %& the first app!ication on a s&stem that re=uests the port. UDP DNS AND D4*AIN NA*ES +DP (+ser Data ram Protoco!) is an a!ternati*e protoco! to T4P that a!so %ui!ds on IP. The main feature that +DP adds to IP is the port concept that we ha*e just seen in T4P. Aowe*er, it does not pro*ide the two(wa& connection or uaranteed de!i*er& of T4P. Its ad*anta e o*er T4P is speed for simp!e tasks. :ne Internet app!ication that is often run usin +DP rather than T4P is the Domain Name ,er*ice (DN,). 1hi!e e*er& de*ice on the Internet has an IP address such as "#0.<.>?."$$, humans enera!!& find it easier to refer to machines %& names, such as www.e-amp!e.or . DN, pro*ides a mechanism for mappin %ack and forth %etween IP addresses and host names. 3asica!!&, there are a num%er of DN, ser*ers on the Internet, each 8

!istenin throu h +DP software to a port (port 8> if the ser*er is fo!!owin the current IANA assi nment). 1hen a computer on the Internet needs DN, ser*ices2for e-amp!e, to con*ert a host name such as www.e-amp!e.or to a correspondin IP address2it uses the +DP software runnin on its s&stem to send a +DP messa e to one of these DN, ser*ers, re=uestin the IP address. If a!! oes we!!, this ser*er wi!! then send %ack a +DP messa e containin the IP address. Internet host names consist of a se=uence of !a%e!s separated %& dots. The fina! !a%e! in a host name is a top='eve' doma!n. There are two standard t&pes of top(!e*e! domainB eneric (such as .com, .edu, .or , and.%i)) and countr&(code (such as .de, .i! , and.m-). The top(!e*e! domain names are assi ned %& the Internet 4orporation for Assi ned Names and Num%ers (I4ANN), a pri*ate nonprofit or ani)ation formed to take o*er technica! Internet functions that were ori ina!!& funded %& the +.,. o*ernment. Each top(!e*e! domain is di*ided into su%domains (second(!e*e! domains), which ma& in turn %e further di*ided, and so on. The assi nment of second(!e*e! domains within each top(!e*e! domain is performed (for a fee) %& a re istr& operator se!ected %& I4ANN. The owner of a second(!e*e! domain can then further di*ide that domain into su%domains , and so on. +!timate!&, the su%domains of a domain are indi*idua! computers. ,uch a su%domain, consistin of a !oca! host name fo!!owed %& a domain name (t&pica!!& consistin of at !east two !a%e!s) is sometimes ca!!ed a f ''y 6 a'!f!ed doma!n name for the computer. 9or e-amp!e, www.e-amp!e.or is a fu!!& =ua!ified domain name for a host with !oca! name """ that %e!on s to the e?amp'e second='eve' domain of the org top='eve' doma!n. ,ome user(!e*e! too!s are a*ai!a%!e that a!!ow &ou to =uer& the Internet DN,. 9or e-amp!e, on most s&stems the ns'oo# p command can %e t&ped at a command prompt.

4BCDns!ookup www.e-amp!e.or ,er*erB s!a*e#.dns.star ate.net AddressB 0<#."$$."$"."0"

NameB www.e-amp!e.or AddressB "#0.<.>?."$$ $

4BCDns!ookup "#0.<.>?."$$ ,er*erB s!a*e#.dns.star ate.net AddressB 0<#."$$."$"."0"

NameB www.e-amp!e.com AddressB "#0.<.>?."$$ E*en if mu!tip!e =ua!ified names are associated with an IP address, on!& one of the names wi!! %e returned %& a re*erse !ookup. This is known as the canon!ca' name for the host; a!! other names are considered a!iases.

@!gher 5eve' Protoco's ,imi!ar!&, a *ariet& of hi her(!e*e! protoco!s are used to communicate once a T4P connection has %een esta%!ished. S*TP and )TP, mentioned ear!ier, are two e-amp!es of wide!& used hi her(!e*e! protoco!s that are used to communicate o*er T4P connections. ,.TP supports transfer of e(mai! %etween different e(mai! ser*ers, whi!e 9TP is used for transferrin fi!es %etween machines. Another hi her(!e*e! T4P protoco!, Te!net, is used to e-ecute commands t&ped into one computer on a remote computer. As we wi!! see, Te!net can a!so %e used to communicate direct!& (*ia ke&%oard entries) with some T4P(%ased app!ications. As descri%ed ear!ier, which protoco! wi!! %e used to communicate o*er a T4P connection is norma!!& determined %& the port num%er used to esta%!ish the connection. The primar& T4P(%ased protoco! used for communication %etween we% ser*ers and %rowsers is ca!!ed the @yperte?t Transport Protoco' (@TTP$. In some sense, just as IP is a ke& component in the definition of the Internet, ATTP is a ke& component in the definition of the 1or!d 1ide 1e%.

Aor'd A!de Ae( ,ome of the more popu!ar information mana ement techno!o ies in the ear!& "##<s were ;opher !nformat!on servers, which pro*ided a simp!e hierarchica! *iew of documents; the A!de Area Informat!on System (1AI,) s&stem for inde-in and retrie*in information; and the AR4AIE too! for searchin on!ine information archi*es accessi%!e *ia 9TP. E

1or!d wide we% consist of two t&pes of software ser*er and c!ient. An Internet(connected computer that wishes to pro*ide information to other Internet s&stems must run ser*er software, and a s&stem that wishes to access the information pro*ided %& ser*ers must run c!ient software (for the 1e%, the c!ient software is norma!!& a we% %rowser). The ser*er and c!ient app!ications communicate o*er the Internet %& fo!!owin a communication protoco! %ui!t on top of T4P5IP. A %i er ad*anta e for the 1e% is the t&pe of information communicated. .ost we% pa es are written usin the A&perte-t .arkup Fan ua e, AT.F, which a!on with ATTP is a fundamenta! we% techno!o &. AT.F pa es can contain the fami!iar we% !inks (technica!!& ca!!ed h&per!inks) to other documents on the 1e%. 1hi!e certain Gopher pa es cou!d a!so contain !inks, norma! Gopher documents were p!ain te-t. 1AI, and AR4AIE pro*ided no direct support for !inks. In addition to h&per!inks, modern *ersions of AT.F a!so pro*ide e-tensi*e pa e !a&out faci!ities, inc!udin support for in!ine raphics, which (as &ou mi ht uess) has added si nificant!& to the commercia! appea! of the 1e%. @yper Te?t Transfer Protoco' ATTP is a form of communication protoco!, in particu!ar a detai!ed specification of how we% c!ients and ser*ers shou!d communicate. The %asic structure of ATTP communication fo!!ows what is known as are =uest@response mode. 1hen the Enter ke& is pressed after t&pin the address, the %rowser creates a messa e conformin to the ATTP protoco!, which uses DN, to o%tain an IP address for www.e-amp!e.or , it then creates a T4P connection with the machine at the IP address o%tained, sent the ATTP messa e o*er this T4P connection, and recei*ed %ack a messa e containin the information that is shown disp!a&ed in the c!ient area of the %rowser. An ATTP re=uest messa e consists of a start !ine fo!!owed %& a messa e header and optiona!!& a messa e %od&.

@TTP Re6 est *essage 4vera'' Str ct re E*er& ATTP Re=uest messa e consist has the same %asic structure ,tart !ine Aeader fie!d(s) (one or more) 3!ank !ine /

.essa e %od& (optiona!)

Start 5!ne E?amp'e of start '!ne ;ET + @TTP+... E*er& start !ine consist of three parts ". Re=uest method 0. Re=uest(+RI portion of we% address >. ATTP *ersion

Re6 est *ethod ;ET. It returns the resource specified %& the Re=uest(+RI as the %od& of a response messa e. P4ST. It pass the %od& of this re=uest messa e on as data to %e processed %& the resource specified %& the Re=uest(+RI. @EAD. It return the same ATTP header fie!ds that wou!d %e returned if a GET method were used, %ut not return the messa e %od& that wou!d %e returned to a GET (this pro*ides information a%out a resource without the communication o*erhead of transmittin the %od& of the response, which ma& %e =uite !ar e). 4PTI4NS. It returns a !ist of ATTP methods that ma& %e used to access the resource specified %& the Re=uest(+RI. PUT. It store the %od& of this messa e on the ser*er and assi n the specified Re=uest(+RI to the data stored so that future GET re=uest messa es containin this Re=uest(+RI wi!! recei*e this data in their response. DE5ETE. It respond to future ATTP re=uest messa es that contain the specified Re=uest( +RI with a response indicatin that there is no resource associated with this Re=uest(+RI. TRA%E It return a cop& of the comp!ete ATTP re=uest messa e, inc!udin start !ine, header fie!ds, and %od&, recei*ed %& the ser*er. +sed primari!& for test purposes.

Re6 ested URI #

The concatenation of the strin httpB55, the *a!ue of the Aost header fie!d (www.e-amp!e.or , e-amp!e), and the Re=uest(+RI (5in this e-amp!e) forms a strin known as a Un!form Reso rce Ident!f!er (URI$. A +RI is an identifier that is intended to %e associated with a particu!ar resource (such as a we% pa e or raphics ima e) on the 1or!d 1ide 1e%. E*er& +RI consists of two partsB the scheme, which appears %efore the co!on (B), and another part that depends on the scheme . 1e% addresses, for the most part, use the http scheme (the scheme name in +RIs is case insensiti*e, %ut is enera!!& written in !owercase !etters). In this scheme, the +RI represents the !ocation of a resource on the 1e%. A +RI of this t&pe is said to %e a Un!form Reso rce 5ocator(UR5$. Therefore, +RIs usin the http scheme are %oth +RIs and +RFs. In addition to the +RF t&pe of +RI, there is one other t&pe, ca!!ed a Un!form Reso rce Name(URN$. 1hi!e not as common as +RFs, +RNs are sometimes used in we% de*e!opment. A +RN is desi ned to %e a uni=ue name for a resource rather than specif&in a !ocation at which to find the resource. 9or e-amp!e, an edition of 1ar and Peace has an I,3N (Internationa! ,tandard 3ook Num%er) of <("?<?(??"E(> associated with it, and this is the on!& %ook wor!dwide with this num%er. E-amp!e. urnBI,3NB<("?<?(??"E(> The +RI for a +RN a!wa&s consists of three co!on(separated parts, as i!!ustrated here. The f!rst part is the scheme name, which is a!wa&s urn for a +RN(t&pe +RI. The second part is the namespace identifier, which in this e-amp!e is I,3N. The third part is the namespace=spec!f!c strin . The e-act format and meanin of this strin *aries with the namespace. In this e-amp!e it represents the IS3N of a (oo#.

@TTP Bers!on It specifies the *ersion of ATTP which is used for sendin re=uest to the ser*er. 4urrent ATTP *ersion is ".<.

@eader )!e'ds and *I*E Types

E-amp!e of Aeader 9ie!ds hostC """.e?amp'e.orgC,-D0/ ser=agentC *oE!''a+,.: (A!ndo"sF UF A!ndo"s NT ,..F en=USF rvC..1$ ;ec#o+9::2:-91 acceptCte?t+?m'&app'!cat!on+?m'&app'!cat!on+?htm'G?m'&te?t+htm'F6H:./&te?t+p'a!nF6H:.0&v!deo+? mng&!mage+png&!mage+jpeg& !mage+g!fF6H:.9&I+IF6H:.. "<

accept='ang ageC en= s&enF6H:., accept=encod!ngC gE!p&def'ate accept=charsetC IS4=00,/=.& tf=0F6H:.D&IF6H:.D connect!onC #eep=a'!ve #eep=a'!veC 2:: content=typeC app'!cat!on+?="""=form= r'encoded content='engthC .2 do!tH%'!c#Gme

Each header fie!d %e ins with a fie!d name, such as host fo!!owed %& co!on and then a fie!d name.

@ost. It specifies the host ser*er to which the ATTP re=uest is %ein sent. User Agent. It specifies the t&pe of %rowser which is %ein used to send the ATTP re=uest. Accept. It specifies the wa& the t&pe of data that is %ein accepted %& the %rowser. Accept 5ang age. It is the !an ua e which is supported %& the %rowser. Accept Encod!ng. It specifies the character encodin method used for sendin the re=uest to the ser*er. Accept %harset. It specifies the character set which is used for representin the characters. %onnect!on. It is used in case the c!ient wants to keep the connection a!i*e. Jeep a'!ve. It specifies the time for which the 4onnection shou!d %e kept a!i*e. %ontent Type. It specifies the t&pe of content which is %ein used for the messa e %od& if it is their. A messa e %od& wi!! %e added to ATTP re=uest in the case of P+T or P:,T method. %ontent 5ength. It specifies the !en th (in H3s) of the content used for the messa e %od&.

The I=' in the accept !an ua e represents the =ua!it& of the content. The =ua!it& *a!ues ran es %etween < to " where < %ein !ower =ua!it& and " %ein the hi her =ua!it&. ""

@TTP Response An ATTP response messa e consists of a status !ine, header fie!ds, and the %od& of the response, in the fo!!owin format ,tatus !ine Aeader fie!d(s) (one or more) 3!ank !ine .essa e %od& (optiona!)

Response Stat s 5!ne E?amp'e. @TTP+... 9:: 4J The status !ine consist of three fie!ds @TTP vers!on& n mer!c stat s code !nd!cat!ng type of response and a te?t str!ng that presents the !nformat!on represented (y n mer!c stat s code. In this e-amp!e the status code is 0<< and the reason phrase is :H.

A!! status codes are three(di it decima! num%ers. The first di it represents the enera! c!ass of status code. There are fi*e c!asses of ATTP5"." status codes. The !ast two di its of a status code define the specific status within the specified c!ass.

D!g!t " 0 > ? 8

%'ass Informationa! ,uccess Redirectiona! 4!ient Error ,e*er Error

Standard Use Pro*ides information to c!ient %efore re=uest processin has %een comp!eted. Re=uest has %een successfu!!& processed. 4!ient needs to use a different resource to fu!fi!! re=uest. 4!ient's re=uest is in*a!id. An error occurred durin re=uest. "0 ser*er processin of a *a!id

%ommon Stat s %odes. Stat s %ode 0<< ><" ><E ?<" ?<> ?<? 8<< Recommended Reason Phrase :H .o*ed Permanent!& Temporar& Redirect +nauthori)ed 9or%idden Not 9ound Interna! ,er*er Error Us a' *ean!ng Re=uest Processed norma!!&. +RI for the re=uest resource has chan ed. +RI for the re=uest resource has chan ed at !east temporar&. The resource is password protected. The resource is present on the ser*er %ut is read protected. No resource correspondin Re=uest(+RI was found. to the i*en

,er*er software detected an interna! fai!ure.

@eader )!e'ds Date. Time at which response was enerated. Server. Information identif&in the ser*er software eneratin this response. 5ast *od!f!ed. Time at which the resource returned %& the re=uest was !ast modified. E?p!res. Time after which the c!ient shou!d check the ser*er %efore retrie*in the returned resource from the c!ients cache. ETag. A hash code of the resource returned. If the resource remains unchan ed on su%se=uent re=uest, then the Etag *a!ue wi!! a!so remain unchan ed. Accept Ranges. 4!ients can re=uest that on!& a portion(ran e) of a resource %e returned %& usin the Ran e header fie!d. This mi ht %e used if the resource sa& , a !ar e PD9 fi!e and on!& a sin !e pa e is current!& needed. Accept Ran e specifies the units that ma& %e used %& the c!ient in a ran e re=uest, or none if ran e re=uest are not accepted %& this ser*er for this resource. 5ocat!on. +sed in responses with redirect status code to specif& new +RI for the re=uested resource. ">

Date ,Fast .odified, E-pires and Eta are the header fie!ds which are used in cache contro!.

%ache %ontro' In computer s&stems, a cache is a repositor& for copies of information that ori inates e!sewhere. A cop& of information is p!aced in a cache in order to impro*e s&stem performance. .ost we% %rowsers automatica!!& cache on the c!ient machine man& of the resources that the& re=uest from ser*ers *ia ATTP. 9or e-amp!e, if an ima e such as a %utton icon is inc!uded in a we% pa e, a cop& of the ima e o%tained from the ser*er wi!! t&pica!!& %e cached in the c!ient's fi!e s&stem. Then if another pa e at the same site uses the same ima e, the ima e can %e retrie*ed from the c!ient fi!e s&stem rather than sendin another ATTP re=uest to the ser*er and waitin for the ser*er's response containin the ima e. ATTP cachin , when successfu!, enera!!& !eads to =uicker disp!a& %& the %rowser, reduced network communication, and reduced !oad on the we% ser*er. the precedin e-amp!e is modified on the ser*er, There is a ke& draw%ack to usin a cacheB !nformat!on !n a cache can (ecome !nva'!d . 9or e-amp!e, if the %utton ima e in the precedin e-amp!e is modified on the ser*er, %ut a c!ient accesses its cached cop& of the o!der *ersion of the ima e, then the c!ient wi!! disp!a& an in*a!id *ersion of the ima e. A 4ache cop& is *a!idated usin the header fie!ds such as Date, Fast .odified ,E-pires and ETa .

%haracter Set 4haracters are represented %& inte er *a!ues within a computer. A character set defines the mappin %etween these inte ers, or code points, and characters. 9or e-amp!e, +,(A,4II JR94(">?8K is the character set used to represent the characters used in ATTP header fie!d names, and is a!so used in ke& portions of man& other Internet protoco!s. Each US=AS%II character can %e represented %& a D=(!t !nteger, which is con*enient in part %ecause the messa es transmitted %& the Internet Protoco! are *iewed as streams of 0=(!t (ytes, and therefore each character can %e represented %& a sin !e %&te. Aowe*er, man& characters in common use in modern !an ua es are not contained in the +,( A,4II character set. :*er the &ears, a wide *ariet& of other character sets ha*e %een defined for use with !an ua es other than +.,. En !ish and a!so for representin characters that are not associated with human !an ua e representation, such as mathematica! and raphica! s&m%o!s. 9or we% pa es, which are meant to %e *iewed throu hout the wor!d, it is *ita! that a sin !e wor!dwide character set %e used. ,o, as in the La*a T. pro rammin !an ua e, the under!&in "?

character set used interna!!& %& we% %rowsers is defined %& the Un!code T* Standard <UNI%4DE>. The +nicode ,tandard is an attempt to pro*ide a sin !e character set that encompasses e*er& human !an ua e representation as we!! as a!! other common!& used s&m%o!s. The Un!code Standards 3as!c * 't!'!ng a' P'ane (3*P$& "h!ch covers most of the common'y sed characters !n every modern 'ang age& ses .-=(!t character codes& and the f '' character code space of the Un!code Standard e?tends to 9.=(!t !ntegers. A character encod!ng is a %it strin that must %e decoded into a code(point inte er that is then mapped to a character accordin to the definition pro*ided %& some character set. A character encodin often represents characters usin *aria%!e(!en th %it strin s, with common characters represented usin shorter strin s and !ess(common characters usin !on er strin s. 9or e-amp!e, +T9(/ and +T9("$ are encodin s of the character set in +nicode that use *aria%!e num%ers of /( and "$(%it *a!ues to encode a!! possi%!e +nicode ,tandard characters. The Accept(4harset header fie!d is used %& a c!ient to te!! a ser*er the character sets and character encodin s that it wi!! accept as we!! as its preferred character sets or encodin s, if more than one is a*ai!a%!e for the re=uested document. In our e-amp!e, the header fie!d accept(charsetB I,:(//8#(",utf(/;=M<.E,N;=M<.E the c!ient wou!d prefer to recei*e documents usin the I,:(//8#(" character set or the +T9(/ encodin of the characters in +nicode, %ut that it wou!d a!so accept an& other *a!id Internet character set5encodin . The +,(A,4II character set is a su%set of %oth the I,:(//8#(" character set and the +T9(/ character encodin s, so the charset parameter is set to one of these two *a!ues for man& +,(A,4II documents in order to ensure internationa! compati%i!it&.

AE3 %5IENTS A we% c!ient is software that accesses a we% ser*er %& sendin an ATTP re=uest messa e and processes the resu!tin ATTP response. Ear!& we% %rowsers enera!!& either were te-t(%ased or ran on specia!i)ed p!atforms, such as computers from ,un .icros&stems or the now(defunct NeOT ,&stems. The .osaic T. %rowser, de*e!oped at the Nationa! 4enter for ,upercomputer App!ications (N4,A) in "##>, was the startin point for %rin in raphica! we% %rowsin to the enera! pu%!ic. The de*e!opers of .osaic founded Netscape 4ommunications 4orporation, which dedicated a !ar e team to de*e!opin and marketin a series of Netscape Na*i ator %rowsers %ased on .osaic. .icrosoft soon fo!!owed with the .icrosoft Internet E-p!orer (IE) %rowser, which was ori ina!!& %ased on .osaic. "8

Netscape soon found itse!f at a disad*anta e, howe*er, as .icrosoft %e an %und!in IE with its popu!ar 1indows operatin s&stem. The war soon ended, and .icrosoft was *ictorious. Netscape, ac=uired %& America :n!ine (at the time primari!& an Internet ser*ice pro*ider), chose to make its source code pu%!ic and !aunched the .o)i!!a project as an open(source approach to de*e!opin new core functiona!it& for the Netscape %rowser. In particu!ar, Netscape %rowser re!eases startin with *ersion $.< ha*e %een %ased on software de*e!oped as part of the .o)i!!a project. Despite this di*ersit&, a!! of the major modern %rowsers support a common set of %asic user features and pro*ide simi!ar support for ATTP communication. 3as!c 3ro"ser ) nct!ons The window of a t&pica! modern %rowser is sp!it into se*era! rectan u!ar re ions, most of which are known as (ars.

%'!ent Area. The primar& re ion where document is disp!a&ed. T!t'e 3ar. Disp!a&s a tit!e assi ned %& document author to the document current!& disp!a&ed within the c!ient area. *en 3ar. It contains a set of dropdown menus, much !ike most other app!ications that incorporate a raphica! user interface (G+I). Nav!gat!on Too'(ar. The %rowser's Na*i ation too!%ar contains standard push(%utton contro!s that a!!ow the user to return to a pre*ious!& *iewed we% pa e (3ack), re*erse the effect of pressin 3ack (9orward), ask the ser*er for an updated *ersion of the pa e current!& "$

*iewed (Re!oad), ha!t pa e down!oadin current!& in pro ress (,top), and print the c!ient area of the window (Print). 5ocat!on 3ar. A user can enter a +RF and press the Enter ke& in order to re=uest the %rowser to disp!a& the document !ocated at the specified +RF. Stat s 3ar. Disp!a&s messa es and icons re!ated to the status of the %rowser. 9or e-amp!e, the two icons in the ri ht portion of the status %ar in fi ure show that the %rowser is on!ine (!eft icon) and that the %rowser is communicatin with the ser*er o*er an insecure communication channe!.

The (ro"ser m st perform a n m(er of tas#sC ". Reformat the +RF entered as a *a!id ATTP re=uest messa e. 0. If the ser*er is specified usin a host name (rather than an IP address), use DN, to con*ert this name to the appropriate IP address. >. Esta%!ish a T4P connection usin the IP address of the specified we% ser*er. ?. ,end the ATTP re=uest o*er the T4P connection and wait for the ser*er's response. 8. Disp!a& the document contained in the response.

UR5s An http(scheme +RF consists of a num%er of pieces. In order to show the main possi%i!ities, !et's consider the fo!!owin e-amp!e +RFB

httpC++""".e?amp'e.orgC,-D0/+a+(+c.t?tKtH"!nLsHchessMpara, A thor!ty. The portion of an http +RF fo!!owin theB55strin and %efore the ne-t s!ash (5) (or throu h the comp!etion of the +RF, if there is no trai!in s!ash) is known as the a thor!ty of the +RF. It consists of either a fu!!& =ua!ified domain name or an IP address of an Internet we% ser*er, optiona!!& fo!!owed %& a co!on (B) and a port num%er.


Path. The portion from the s!ash fo!!owin the authorit& throu h the =uestion mark (P) is ca!!ed the path of the +RF. The !eadin s!ash is part of the path, %ut the =uestion mark is not. ,o the path in the e-amp!e +RF just i*en is 5a5%5c.t-t. N ery Port!on. 9o!!owin the path there ma& %e a =uestion mark fo!!owed %& information up to a num%er si n (Q). The information %etween %ut not inc!udin the =uestion mark and num%er si n is the =uer& portion of the +RF, and in enera! a strin of the form shown is known as a =uer& strin . The =uer& portion of the e-amp!e +RF is tMwin R sMchess. :ri ina!!&, the =uer& portion of a +RF was intended to pass search terms to a we% ser*er. ,o in this e-amp!e, it mi ht %e that the user is seekin a resource with a tit!e containin the strin 6win7 that is re!ated to the su%ject 6chess. )ragment. The fina! optiona! part of an http(scheme +RF2the portion fo!!owin %ut not inc!udin the num%er si n(Q)2is known as the fragment of the UR5, and the strin contained in the fra ment is known as a fragment !dent!f!er. 9ra ment identifiers are used %& %rowsers to scro!! AT.F documents.

User %ontro''a('e )eat res Save. .ost documents can %e sa*ed %& the user to the c!ient machine's fi!e s&stem. If the document is an AT.F pa e that contains other documents, such as ima es, then the %rowser wi!! attempt to sa*e a!! of these documents !oca!!& so that the entire pa e can %e disp!a&ed from the !oca! fi!e s&stem. )!nd !n pageB ,tandard documents (te-t and AT.F) can %e searched with a function that is simi!ar to that pro*ided %& most word processors. A tomat!c form f!''!ngB The %rowser can 6remem%er7 information entered on certain forms, such as %i!!in address, phone num%ers, etc. 1hen another form is *isited at a !ater date, the %rowser can automatica!!& fi!! in pre*ious!& sa*ed data. PreferencesB +sers can customi)e %rowser functiona!it& in a wide *ariet& of wa&s. In .o)i!!a, a window presentin preference options is o%tained %& se!ectin EditS Preferences. The Appearance, Na*i ator, and Ad*anced cate ories (!eft su%window) and their su%cate ories are used to customi)e .o)i!!a. ,ome preference settin s direct!& re!ated to the ATTP topics co*ered ear!ier areB @ Accept=5ang ageB The non( *a!ues sent %& the %rowser for this ATTP re=uest header fie!d can %e set under the Na*i atorSFan ua es cate or&, Fan ua es for 1e% Pa es %o-. "/

Defa 't character set+encod!ngB The character set5encodin to %e assumed for documents that do not specif& one is a!so set under Na*i atorSFan ua es in the 4haracter 4odin %o-. %ache propert!esB The amount of !oca! stora e a!!ocated to the cache and the conditions contro!!in when a cached fi!e wi!! %e *a!idated are set under Ad*ancedS 4ache in the ,et 4ache :ptions %o-.

@TTP sett!ngsB The *ersion of ATTP used and whether or not the c!ient wi!! keep connections a!i*e is set under Ad*ancedSATTP Networkin in the Direct 4onnection :ptions %o-. Sty'e def!n!t!onB The user can define certain aspects affectin how the %rowser renders AT.F pa es, such as font si)es, %ack round and fore round co!ors, etc. In .o)i!!a, the font si)e can %e modified usin TiewSTe-t Uoom. If a pa e offers a!ternati*e st&!es, the& can %e se!ected usin theTiewS+se ,t&!e menu as discussed in, where methods for chan in defau!t %rowser st&!e settin s are a!so descri%ed. Doc ment meta=!nformat!onB Interested users can *iew information a%out the disp!a&ed document, such as the document's .I.E t&pe, character encodin , si)e, and, if the document was written usin AT.F, the raw AT.F source from which the renderin in the c!ient area was produced. In .o)i!!a,TiewSPa e ,ource is used to *iew raw AT.F, and TiewSPa e Info to *iew other so(ca!!ed meta(information, that is, information a%out the document rather than information contained in the document itse!f. ThemesB The !ook of one or more of the %rowser %ars, particu!ar!& the na*i ation %ar, can %e modified %& app!&in a certain theme (sometimes ca!!ed a 6skin7). In .o)i!!a, the %rowser scheme can %e modified usin TiewSApp!& Theme. Additiona! themes can %e o%tained fromTiewSApp!& ThemeSGet New Themes. @!storyB The %rowser wi!! automatica!!& maintain a !ist of a!! pa es *isited within the !ast se*era! da&s. +sers can use the histor& !ist to easi!& return to an& recent!& *isited pa e. In .o)i!!a, the histor& !ist can %e reached %& se!ectin GoSAistor&. 3oo#mar#s (7favor!tes8 !n Internet E?p'orer$ B+sers can e-p!icit!& %ookmark a we% pa e, that is, sa*e the +RF for that pa e for an indefinite !en th of time. At an& !ater time, the %rowser's %ookmark faci!it& can %e used to easi!& return to an& %ookmarked pa e.

Add!t!ona' ) nct!ona'!ty


A tomat!c UR5 comp'et!onB If the user has entered a +RF in the Focation %ar and %e ins to t&pe it a ain (within the ne-t se*era! da&s), the +RF wi!! %e comp!eted automatica!!& %& the %rowser. Scr!pt e?ec t!onB In addition to disp!a&in documents, %rowsers can run pro rams (scripts). These pro rams can perform a *ariet& of tasks, from *a!idatin data entered on a form %efore sendin it to a we% ser*er to creatin *arious d&namic effects on we% pa es, such as drop( down menus. Event hand'!ngB 1hen the user performs an action, such as c!ickin on a !ink or a %utton in a we% pa e, the %rowser treats this as the occurrence of an e*ent. 3rowsers reco ni)e a num%er of different t&pes of e*ents, inc!udin mouse %utton c!icks, mouse mo*ement, and e*en e*ents not direct!& under user contro! such as the comp!etion of the %rowser's renderin of a document. *anagement of form ;UIB If a we% pa e contains a form with fi!!(in fie!ds, the %rowser must a!!ow the user to perform standard te-t(editin functions within these fie!ds. It a!so needs to automatica!!& pro*ide certain raphica! feed%ack, such as chan in a %utton ima e when it is pressed or pro*idin a te-t cursor in a te-t fie!d that wi!! recei*e ke&%oard input. Sec re comm n!cat!onB 1hen the user sends sensiti*e information, such as a credit card num%er, to a we% ser*er, the %rowser can encode this information in a wa& the pre*ents an& machines a!on the IP route from the c!ient to the ser*er from o%tainin the information. P' g=!n e?ec t!onB 1hi!e the %rowser itse!f norma!!& understands on!& a !imited num%er of .I.E t&pes, most %rowsers support some form of p!u (in protoco! that a!!ows the %rowser's capa%i!ities to %e supp!emented %& other software. If a %rowser has a p!u in for disp!a&in , sa&, a document conformin to the app!ication5pdf .I.E t&pe, then when the %rowser recei*es such a document it wi!! pass it2*ia the p!u (in protoco!2to the appropriate p!u (in for disp!a&.

AE3 SERBERS Server )eat res The primar& feature of e*er& we% ser*er is to accept ATTP re=uests from we% c!ients and return an appropriate resource (if a*ai!a%!e) in the ATTP response. E*en this %asic functiona!it& in*o!*es a num%er of steps ". The ser*er ca!!s on T4P software and waits for connection re=uests to one or more ports. 0<

0. 1hen a connection re=uest is recei*ed, the ser*er dedicates a 6su%task7 to hand!in this connection. >. The su%task esta%!ishes the T4P connection and recei*es an ATTP re=uest. ?. The su%task e-amines the Aost header fie!d of the re=uest to determine which 6*irtua! host7 shou!d recei*e this re=uest and in*okes software for this host. 8. The *irtua! host software maps the Re=uest(+RI fie!d of the ATTP re=uest start !ine to a resource on the ser*er. $. If the resource is a fi!e, the host software determines the .I.E t&pe of the fi!e (usua!!& %& a mappin from the fi!e(name e-tension portion of the Re=uest(+RI), and creates an ATTP response that contains the fi!e in the %od& of the response messa e. E. If the resource is a pro ram, the host software runs the pro ram, pro*idin it with information from the re=uest and returnin the output from the pro ram as the %od& of an ATTP response messa e. /. The ser*er norma!!& !o s information a%out the re=uest and response2such as the IP address of the re=uester and the status code of the response2in a p!ain(te-t fi!e. #. If the T4P connection is kept a!i*e, the ser*er su%task continues to monitor the connection unti! a certain !en th of time has e!apsed, the c!ient sends another re=uest, or the c!ient initiates a connection c!ose

A!! modern ser*ers can concurrent!& process mu!tip!e re=uests. It is as if mu!tip!e copies of the ser*er were runnin simu!taneous!&, each de*oted to hand!in the re=uests recei*ed o*er a sin !e T4P connection. The specifics of how this concurrenc& is actua!!& imp!emented on a s&stem ma& depend on man& factors, inc!udin the num%er of processors a*ai!a%!e in the s&stem, the pro rammin !an ua e used, and pro rammer choices. S (tas# is used to refer to the concept of a sin !e 6cop&7 of the ser*er software hand!in a sin !e c!ient connection.

B!rt a' @ost E*er& ATTP re=uest must inc!ude a Aost header fie!d. The reason for this re=uirement is that mu!tip!e host names ma& a!! %e mapped %& the Internet DN, s&stem to a sin !e IP address. 9or e-amp!e, a sin !e ser*er machine within a co!!e e ma& host we% sites for mu!tip!e departments. Each we% site wou!d %e assi ned its own fu!!& =ua!ified domain name, such as """.cs.e?amp'e.ed &""".phys!cs.e?amp'e.ed , and so on. 3ut DN, wou!d %e confi ured to map a!! of these domain names to a sin !e IP address. 1hen an ATTP re=uest is recei*ed %& the 0"

we% ser*er at this address, it can determine which v!rt a' host is %ein re=uested %& e-aminin the Aost header. ,eparate!& confi ured software can then %e used to hand!e the re=uests for each *irtua! host.

Server @!story N4,A's httpd we% ser*er was a startin point for ser*er de*e!opment. httpd was used on a !ar e fraction of the ear!& we% ser*ers, %ut the N4,A discontinued de*e!opment of the ser*er in the mid("##<s. 1hen this happened, se*era! indi*idua!s who were runnin httpd at their sites joined forces and %e an de*e!opin their own updates to the open(source httpd software. Their updates were ca!!ed 6patches,7 and this !ed to ca!!in their work 6a patchy server,7 which soon %ecame known as 6the Apache server.7 .icrosoft %e an de*e!opment of we% ser*ers we!! after others had %e un, %ut =uick!& cau ht up. .icrosoft's Internet Information ,er*er (II,) pro*ides essentia!!& a!! of the features found in Apache, a!thou h II, does ha*e the draw%ack of runnin on!& on 1indows s&stems, whi!e Apache runs on 1indows, Finu-, and .acintosh s&stems. II, and Apache are, at the time of this writin , %& far the most wide!& used ser*ers on the market. 3oth ser*ers can %e confi ured to run a *ariet& of t&pes of pro rams, a!thou h certain pro rammin !an ua es tend to %e used more fre=uent!& on one s&stem than the other. 9or e-amp!e, man& II, ser*ers run pro rams written in T3,cript (a deri*ati*e of Tisua! 3asic), whi!e a t&pica! Apache ser*er mi ht run pro rams written in either Per! or the PAP scriptin !an ua e (PAP stands for 6PAP A&perte-t Processor7; &es, the definition is infinite!& recursi*e). A num%er of II, and Apache ser*ers a!so run La*a pro rams. Ahen r nn!ng a Oava program& (oth Apache and IIS servers are s a''y conf!g red to r n the program (y s!ng separate soft"are ca''ed a serv'et conta!ner. The ser*!et container pro*ides the La*a Tirtua! .achine that runs the La*a pro ram (known as a ser*!et), and a!so pro*ides communication %etween the ser*!et and the Apache or II, we% ser*er.

Server %onf!g rat!on and T n!ng 3road!& speakin , ser*er confi uration can %e %roken into two areasB e?terna' comm n!cat!on and !nterna' process!ng. In Tomcat, this corresponds to two separate La*a packa esB %oyote& "h!ch prov!des the @TTP+... comm n!cat!on, and %ata'!na& "h!ch !s the act a' serv'et conta!ner. 4o&ote parameters, affectin e-terna! communication, inc!ude the fo!!owin 00

IP addresses and T4P ports that ma& %e used to connect to this ser*er. Num%er of su%tasks (ca!!ed threads in La*a) that wi!! %e created when the ser*er is initia!i)ed. This man& T4P connections can %e esta%!ished simu!taneous!& with minima! o*erhead. .a-imum num%er of threads that wi!! %e a!!owed to e-ist simu!taneous!&. If this is !ar er than the pre*ious *a!ue, then the num%er of threads maintained %& the ser*er ma& chan e, either up or down, o*er time. .a-imum num%er of T4P connection re=uests that wi!! %e =ueued if the ser*er is a!read& runnin its ma-imum num%er of threads. 4onnection re=uests recei*ed if the =ueue is fu!! wi!! %e refused. Fen th of time the ser*er wi!! wait after ser*in an ATTP re=uest o*er a T4P connection %efore c!osin the connection if another re=uest is not recei*ed.

The settin s of these parameters can ha*e a si nificant inf!uence on the performance of a ser*er; chan in the *a!ues of these and simi!ar parameters in order to optimi)e performance is often referred to as t n!ng the server.

The !nterna' %ata'!na port!on of Tomcat a!so has a num%er of parameter settin s that affect functiona!it&. These settin s can determineB 1hich c!ient machines ma& send ATTP re=uests to the ser*er. 1hich *irtua! hosts are !istenin for T4P connections on a i*en port. 1hat !o in wi!! %e performed.

Aow the path portion of Re=uest(+RIs wi!! %e mapped to the ser*er's fi!e s&stem or other resources.

1hether or not the ser*er's resources wi!! %e password protected. 1hether or not resources wi!! %e cached in the ser*er's memor&.

Each ,er*ice in Tomcat is a!most its own we% ser*er, e-cept that a ,er*ice cannot %e indi*idua!!& stopped and started. This ,er*ice has fi*e componentsB 0>

4onnector Aost Fo er

Rea!m Ta!*e

%onnector. A 4onnector is a 4o&ote component that hand!es ATTP communications directed to a particu!ar port. ,ome of the ke& fie!ds for the 4onnector component t&pe are Accept %o nt. Fen th of the T4P connection wait =ueue. %onnect!on T!meo t. ,er*er wi!! c!ose connection if it is id!e for this man& mi!!iseconds. IP Address. 3!ank indicates that this 4onnector wi!! accept T4P connections directed to an& IP address associated with this machine. ,pecif&in an address restricts connections to re=uests for that address. Port N m(er. Port num%er on which this 4onnection wi!! !isten for T4P connection re=uests. *!n Spare Threads. Initia! num%er of threads that wi!! %e a!!ocated to process T4P connections associated with this 4onnector. *a? Threads. .a-imum num%er of threads that wi!! %e a!!ocated to process T4P connections associated with this 4onnector. *a? Spare Threads. .a-imum num%er of id!e threads a!!owed to e-ist at an& one time. The ser*er wi!! %e in stoppin threads if the num%er of id!e threads e-ceeds this *a!ue.

@ost. The Aost component is used to define a *irtua! host. The *irtua! host name shou!d norma!!& %e a fu!!& =ua!ified domain name that wou!d %e used %& *isitors to *isit a we%site. A "e( app'!cat!on is a co!!ection of fi!es and pro rams that work to ether to pro*ide a particu!ar function to we% users. 9or e-amp!e, a 1e% site mi ht run two we% app!icationsB one for use %& administrators of the site that pro*ides maintenance functiona!it&, and another for use %& e-terna! c!ients that pro*ides customer 0?

functiona!it&. In Tomcat, a we% app!ication is represented %& a 4onte-t component. 4!ickin on a Aost hand!e icon wi!! re*ea! the !ist of 4onte-ts pro*ided with that *irtua! host. ,ome of the ke& fie!ds of 1e% app!ication are Name +sua!!& the fu!!& =ua!ified domain name (or !oca!host) that c!ients wi!! use to access this *irtua! host. App'!cat!on 3ase Director& containin we% app!ications for this *irtua! host. Dep'oy on Start p 3oo!ean *a!ue indicatin whether or not we% app!ications shou!d %e automatica!!& initia!i)ed when the ser*er starts. A to Dep'oy 3oo!ean *a!ue indicatin whether or not we% app!ications added to the App!ication 3ase whi!e the ser*er is runnin shou!d %e automatica!!& initia!i)ed.

The director& associated with a Aost is specified %& the *a!ue of the App'!cat!on 3ase f!e'd. The *a!ue cou!d %e a re!ati*e path or a%so!ute path.

5ogg!ng 1e% ser*er !o s record information a%out ser*er acti*it&. The pr!mary "e( server 'og record!ng norma' act!v!ty !s an access 'og , a fi!e that records information a%out e*er& ATTP re=uest processed %& the ser*er. A we% ser*er ma& a!so produce one or more message 'ogs conta!n!ng a var!ety of de( gg!ng and other !nformat!on generated (y "e( app'!cat!ons as we!! as possi%!& %& the we% ser*er itse!f. Access !o in in Tomcat is performed %& addin a Ba've component to a ,er*ice. The primar& fie!ds for an AccessFo Ta!*e areB D!rectory. Director& (re!ati*e to Tomcat insta!!ation director& or a%so!ute) where !o fi!e wi!! %e written. Pattern. Information to %e written to the !o . Pref!?. ,trin that wi!! %e used to %e in !o fi!e name Reso've @osts. 1hether IP addresses (9a!se *a!ue) or host names (True *a!ue) shou!d %e written to the !o fi!e. 08

Rotata('e 1hether or not date shou!d %e added to fi!e name and fi!e shou!d %e automatica!!& rotated each da&. S ff!?. ,trin that wi!! %e used to end !o fi!e name.

The Pattern for the ,er*ice access !o Ta!*e is Ph P' P Pt QPrQ Ps P( This corresponds to what is often ca!!ed the common access 'og format (in fact, the word 4ommon can %e specified as the *a!ue of the Pattern fie!d to specif& this !o format). The fo!!owin is an e?amp'e access !o !ine in common format (this e-amp!e is sp!it into two !ines for reada%i!it&)B """.e?amp'e.org = adm!n <9:+O '+9::,C:0C:2C99 =:,::> Q;ET +adm!n+frameset.jsp @TTP+...Q 9:: /9:

The fo!!owin information is contained in this !o entr&B Aost name of c!ient machine makin the re=uest. +ser name used to !o in, if ser*er password protection is ena%!ed. Date and time of response, p!us the time )one (offset from G.T) of the time. ,tart !ine of ATTP re=uest (=uoted). ATTP status code of response (0<< in this e-amp!e). Num%er of %&tes sent in %od& of response.

An ad*anta e of usin this !o format is that a *ariet& of 'og ana'yEers ha*e %een de*e!oped that can read !o s in this (and some other) formats and produce reports on *arious aspects of a site's usa e. The Tomcat Fo er component can %e used to create a message 'og. A messa e !o records informationa!, de%u in and error messa es passed to !o in methods %& either ser*!ets or Tomcat itse!f. ,ome of the ke& fie!ds for 9i!e Fo ers (the standard t&pe of messa e !o ) are D!rectory Director& (re!ati*e to Tomcat insta!!ation director& or a%so!ute) where !o fi!e wi!! %e written Pref!? ,trin that wi!! %e used to %e in !o fi!e name S ff!? ,trin that wi!! %e used to end !o fi!e name 0$

T!mestamp 1hether or not date and time shou!d %e added to %e innin of each messa e written to the !o fi!e.

E?amp'e 9::,.0.9 DC20C,1 create4(jectName "!th StandardEng!ne<%ata'!na>

Timestamp propert& set to true, the %e innin of each messa e !o entr& %e ins with a timestamp, that is, with the date and time at which the entr& was written to the !o . Timestamps can %e usefu!, particu!ar!& when tr&in to de%u an app!ication.

Access %ontro' Tomcat can pro*ide automatic password protection for resources that it ser*es. At its heart, this is a two(sta e process. 9irst, a data(ase of ser names is created. Each user name is assi ned a password and a !ist of ro!es. Think of a ro!e as a user's functiona! re!ationship to a we% app!icationB administrator, de*e!oper, end user, etc. ,ome users ma& %e assi ned to mu!tip!e ro!es. The second sta e is to te!! Tomcat that certa!n reso rces can on'y (e accessed (y sers "ho (e'ong to certa!n ro'es and who ha*e authenticated themse!*es as %e!on in to one of these ro!es %& !o in in with an appropriate user name and password. 9or e-amp!e, the Tomcat administration too! app!ication (admin4onte-t) can on!& %e accessed %& users who ha*e !o ed in and who %e!on to the admin ro!e. Rea'm component, associates a user data%ase with a ,er*ice . This particu!ar t&pe of Rea!m indicates that a Tomcat Resource2an o%ject representin a fi!e or other static resource on the ser*er 2wi!! %e used to store the user data%ase. The Rea!m's Resource Name fie!d contains the name of the Resource, which in this case is +serData%ase. If &ou c!ick on the +ser Data%ases !ink in the Resources !ist in the !eft pane! of &our Tomcat administration too! window and then c!ick on the +serData%ase !ink in the +ser Data%ases pane!, &ou wi!! see that this Resource is associated with a fi!e !ocated at conf5tomcat(users.-m!(this is re!ati*e to the Tomcat insta!!ation director&). A coarser( rained access contro! can %e pro*ided %& usin Ba've o(jects of t&pe Remote@ostBa've and RemoteAddressBa've. 3oth are used to specif& c!ient machines that shou!d %e rejected if the& re=uest a connection to the ser*er. The& differ on!& in whether c!ient machine host names or IP addresses are specified. Each t&pe of Ta!*e has two possi%!e !ists of c!ientsB an A''o" '!st and a Deny '!st. To a!!ow access on!& from machines in the e-amp!e.or and e-amp!e.net domains, &ou wou!d enter in the A!!ow !ist I.e?amp'e.org&I.e?amp'e.net. 0E

Sec re Servers To pre*ent ea*esdroppers from o%tainin sensiti*e information, such as credit card num%ers, a!! such sensiti*e information shou!d %e encr&pted %efore %ein transmitted o*er an& pu%!ic communication network. The standard means of indicatin to a %rowser that it shou!d encr&pt an ATTP re=uest is to use the https scheme on the +RF for the re=uest. 9or e-amp!e, enterin the +RF httpsC++""".e?amp'e.org A c!ient %rowser that wishes to communicate secure!& with a ser*er %e ins %& initiatin (o*er T4P5IP) a T5S @andsha#e with the ser*er. Durin the Aandshake process, the ser*er and c!ient a ree on *arious parameters that wi!! %e used to encr&pt messa es sent %etween them. The ser*er a!so sends a cert!f!cate to the c!ient. The certificate ena%!es the c!ient to %e sure that the machine it is communicatin with is the one the c!ient intends (as specified %& the host name in the +RF the %rowser is re=uestin ). 4ertificates are necessar& to a*oid so(ca!!ed man=!n=the=m!dd'e attac#s, in which some machine intercepts a messa e intended for another machine (the tar et), pre*ents the messa e from further forwardin , and returns an ATTP rep!& to the sender pretendin to %e from the tar et. ,uch an interception cou!d occur at a ro ue Internet %rid e de*ice on the route %etween c!ient and ser*er, or throu h unauthori)ed a!teration of the DN, s&stem, for e-amp!e. At the conc!usion of the TF, Aandshake, the c!ient uses the cr&pto raphic parameter information o%tained to encr&pt its ATTP re=uest messa e %efore sendin it to the ser*er o*er T4P5IP. The ser*er's TF, software decr&pts this re=uest %efore an& other ser*er processin is performed. The ser*er simi!ar!& encr&pts its response %efore sendin it to the c!ient, and the c!ient immediate!& decr&pts the recei*ed messa e. Therefore, other ATTP processin software runnin on the c!ient and ser*er are, for the most part, unaffected %& the encr&ption process. Tomcat supports the TF, ".< and ear!ier protoco!s. To ena%!e the secure ser*er Tomcat features, &ou must do two thin sB ". :%tain and insta!! a certificate. 0. 4onfi ure the ser*er to !isten for TF, connections on some port. 4onfi urin the ser*er to !isten for TF, connections simp!& in*o!*es addin a second 4onnector to a ,er*ice (%& se!ectin 4reate New 4onnector from the ,er*ice's Action dropdown menu). The T&pe fie!d of the new 4onnector must %e set to ATTP,. :n the resu!tin 4onnector pane!, make sure that the ,ecure fie!d is set to True (since this is a secure connection), and fi!! in the port num%er (sa& /??>) to %e used for this connection. :ther fie!ds can retain their defau!t *a!ues if &ou run ke& too! with its defau!ts. 0/