Вы находитесь на странице: 1из 72

Web Technology Notes

Module 1. HTTP and CGI, Web Server: INTRODUCTION TO THE WEB MODEL OF COMPUTING: In early days the architecture used for developing web applications was client/server which is 2-tier. But it is not suited for the internet as it is not scalable and not suited for n-tier applications. An n-tier application could provide separate layers for each of the following services:

Presentation or the User Service tier creates a visual gateway for the consumer to interact with the application. This can range from basic HTML and DHTML to complex COM components and Java applets.

Business logic: This tier can range from Web scripting in ASP/PHP/JSP to server side programming such as TCL, CORBA and PERL, that allows the user to perform complex actions through a Web interface.

Data access or Data Service layer: Data services store, retrieve and update information at a high level. Databases, file systems, and writeable media are all examples of Data storage and retrieval devices. For Web applications, however, databases are most practical. Databases allow developers to store, retrieve, add to, and update categorical information in a systematic and organized fashion.

How this can be done using JEE is shown below: JEE app and the MVC architecture In a JEE application: The model -- business layer functionality represented by JavaBeans or EJBs The view -- the presentation layer functionality represented by JSFs (the view) in a web app The controller -- Servlet mediating between model and view Must accommodate input from various clients including HTTP requests from web clients, and WML from wireless clients XML documents from suppliers Etc. MVC Design Pattern The Model-View-Controller (MVC) design pattern separates the core business model functionality from the presentation and control logic that uses this functionality.

The separation allows multiple views to share the same enterprise data model, which makes supporting multiple clients easier to implement, test, and maintain.

View layer in a Web App Display information according to client types Display result of business logic (Model) Not concerned with how the information was obtained, or from where (since that is the responsibility of Model) Model layer in a Web App Models the data and behavior behind the business process It is responsible for: o o o Performing DB queries Calculating the business process Processing orders

Encapsulation of data and behavior which are independent of presentation Controller in a Web App Serves as the logical connection between the user's interaction and the business services on the back. Responsible for making decisions among multiple presentations e.g. User's language, locale or access level dictates a different presentation. A request enters the application through the control layer, which will decide how the request should be handled and what information should be returned. benefits of using the MVC pattern are:

1. Allows separate specification and development of the business logic and the user interfaces. The domain model can be more cohesive, focusing on business processes 2. Separation allows for division of labour - designers can work on templates with minimal input from developers. A designer can change the templates without requiring a programmer to update model code 3. new views can be easily added without changing the domain model, and views are interchangeable 4. To allow execution of domain model without the user interface. Separation allows you to build mock objects that mimic the behavior of concrete objects during testing. Web Applications A web application is a dynamic extension of a web or application server. Types of web applications: o Presentation-oriented Generates interactive web pages containing various types of mark-up language (HTML, XHTML, XML, and so on) and dynamic content in response to requests. o Service-oriented A service-oriented web application implements the endpoint of a web service. In Java EE platform, web components provide the dynamic extension capabilities for a web server. Web components are either Java servlets, web pages, web service endpoints, or JSP pages. Packaging Java EE Web Apps A web application module contains: o o servlets, JSPs, JSF pages, and web services, as well as HTML and XHTML pages, Cascading Style Sheets (CSS), JavaScripts, images, videos, and so on. All these artefacts are packaged in a jar file with a .war extension -- i.e., a war file, or Web Archive. o o o o WEB-INF/web.xml is the optional web deployment descriptor WEB-INF/ejb-jar.xml is the optional EJB Lite beans deployment descriptor. WEB-INF/classes contains all the Java .class files WEB-INF/lib contains any dependent jar files.

java web app request handling

Java EE Architecture Java EE is a set of specifications implemented by different containers. Containers are Java EE runtime environments that provide certain services to the components they host such as lifecycle management, dependency injection, security, etc. Figure shows the logical relationships between containers. The arrows represent the protocols used.

DISTRIBUTION

A multi-tiered, distributed, intranet application will consist of three logical tiers: data, business object, and user interface. ## write MVC

Protocols Page 1: 5 of Photostat User interface ad HTML Page 2: 1 of Photostat Tool for designing web page. Two key concepts Hyper text Mark up language

HTTP Protocol What is http? o o o o o o It is an application layer protocol in TCP/IP stack. It is a request-response protocol. Uses client-server model, with browser as client and web server as the server. Browser is often referred to as user-agent (UA) The original version was HTTP/1.0 The current version is HTTP/1.1

It is a stateless protocol. i.e. the server does not retain state information once a request has been served. A web application requiring to remember state information uses cookies to do so.

o o

A HTTP session consists of a series of request-response transactions. The request message consists of the following: Initial line, for example GET /images/logo.png HTTP/1.1 ending with <CR><LF> [ CR carriage return, LF line feed ] Zero or more headers, such as Accept-Language: en ending with <CR><LF> <CR><LF> Optional message body

o o o

The initial line has three parts separated by spaces Command resource HTTP/version

The initial response line also has three parts HTTP/version status-code reason-phrase o o o o o

Most common status and reasons 200 OK 301 Moved Permanently 302 Found 404 Not Found 500 Server Error

Some of the important headers used when there is a message body are Content-Type, Content-Disposition, Content-Transfer-Encoding, Content-Length Content-Type identifies the MIME type of the data

What is HTTPS? o o o o o It stands for HTTP Secure It is a combination of HTTP and SSL/TLS It is used to transmit sensitive information using HTTP over unsecured networks HTTPS uses port 443 not 80 A web browser connecting to a HTTPS site goes through the following process: The browser verifies the identity of the site by examining its certificate Once the identity is established, client negotiates with the server on what type of encryption to use. Once the encryption type is agreed upon, the client and the server exchange unique encryption keys, which are then used to encrypt the data and transmitted using HTTP

HTTP headers An HTTP message consists of the following:


HTTP header body trailer

The HTTP header consists of:

A request or response line


An HTTP request line contains a method, URL, and version A response line contains a version, status code, and reason phrase

A MIME header A MIME header is comprised of zero or more MIME fields. A MIME field is composed of a field name, a colon, and (zero or more) field values. The values in a field are separated by commas. An HTTP header containing a request line is usually referred to as a request.

The following example shows a typical request header. GET http://www.tiggerwigger.com/ HTTP/1.0 Proxy-Connection: Keep-Alive User-Agent: Mozilla/5.0 [en] (X11; I; Linux 2.2.3 i686) Host: www.tiggerwigger.com Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */ *

Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1, *, utf-8

The response header for the above request might look like the following: HTTP/1.0 200 OK Date: Fri, 13 Nov 2009 06:57:43 GMT Content-Location: http://locutus.tiggerwigger.com/index.html Etag: "07db14afa76be1:1074" Last-Modified: Thu, 05 Nov 2009 20:01:38 GMT Content-Length: 7931 Content-Type: text/html Server: Microsoft-IIS/4.0 Age: 922

Proxy-Connection: close

The following figure illustrates an HTTP message with an expanded HTTP header. Figure 10.1. HTTP Request/Response and Header Structure

The figure below shows example HTTP request and response headers. Figure 10.2. Examples of HTTP Request and Response Headers

Web Server A web server is a piece of software that enables a website to be viewed using HTTP. HTTP (HyperText Transfer Protocol) is the key protocol for the transfer of data on the web. You know when you're using HTTP because the website URL begins with "http://" (for example, "http://www.quackit.com"). [Web server can refer to either the hardware (the computer) or the software (the computer application) that helps to deliver Web content that can be accessed through the Internet.]

How web servers work? Whenever you view a web page on the internet, you are requesting that page from a web server. When you type a URL into your browser (for example, "http://www.quackit.com/html/tutorial/index.cfm"), your browser requests the page from the web server and the web server sends the page back:

The above diagram is a simplistic version of what occurs. Here's a more detailed version: 1. Your web browser first needs to know which IP address the website "www.quackit.com" resolves to. If it doesn't already have this information stored in its cache, it requests the information from one or more DNS servers (via the internet). The DNS server tells the browser which IP address the website is located at. Note that the IP address was assigned when the website was first created on the web server. 2. Now that the web browser knows which IP address the website is located at, it can request the full URL from the web server. 3. The web server responds by sending back the requested page. If the page doesn't exist (or another error occurs), it will send back the appropriate error message. 4. Your web browser receives the page and renders it as required. When referring to web browsers and web servers in this manner, we usually refer to them as a client (web browser) and a server (web server). Web Servers Examples Apache HTTP Server (Apache) Microsoft Internet Information Services (IIS) Sun Java System Web Server

Implementation of a simple HTTP Server (Hello server) The HTTPServer is composed of:

The actual server (HTTPServer) A request dispatcher, instantiated on each request (HTTPRequestDispatcher)

Request handlers instantiated by the dispatcher(deriving from HTTPRequestHandler)

simple HTTP server in Python The default Python distribution has a built-in support to the HTTP protocol that you can use to make a simple stand-alone Web server. The Python module that provides this support is called BaseFTTPServer
#!/usr/bin/python from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer PORT_NUMBER = 8080 #This class will handles any incoming request from #the browser class myHandler(BaseHTTPRequestHandler): #Handler for the GET requests def do_GET(self): self.send_response(200) self.send_header('Content-type','text/html') self.end_headers() # Send the html message self.wfile.write("Hello World !") return try: #Create a web server and define the handler to manage the #incoming request server = HTTPServer(('', PORT_NUMBER), myHandler) print 'Started httpserver on port ' , PORT_NUMBER #Wait forever for incoming http requests server.serve_forever() except KeyboardInterrupt: print '^C received, shutting down the web server' server.socket.close()

CGI in Python What is CGI? o o o o It is a standard that defines how a web server interacts with a standalone application to generate web pages dynamically Such applications are known as CGI scripts. They can be written in any programming language, although scripting languages are more popular A CGI script takes input from STDIN and sends output to STDOUT. It also takes input from certain environment variables. Web Browsing

To understand the concept of CGI, lets see what happens when we click a hyper link to browse a particular web page or URL.

Your browser contacts the HTTP web server and demand for the URL i.e. filename. Web Server will parse the URL and will look for the filename in, if it finds that file then sends back to the browser otherwise sends an error message indicating that you have requested a wrong file.

Web browser takes response from web server and displays either the received file or error message.

However, it is possible to set up the HTTP server so that whenever a file in a certain directory is requested that file is not sent back; instead it is executed as a program, and whatever that program outputs is sent back for your browser to display. This function is called the Common Gateway Interface or CGI and the programs are called CGI scripts. These CGI programs can be a Python Script, PERL Script, Shell Script, C or C++ program etc. CGI Architecture Diagram

Web Server Support & Configuration Before you proceed with CGI Programming, make sure that your Web Server supports CGI and it is configured to handle CGI Programs. All the CGI Programs to be executed by the HTTP server are kept in a pre-configured directory. This directory is called CGI Directory and by convention it is named as /var/www/cgi-bin. By convention CGI files will have extension as .cgi but you can keep your files with python extension .py as well. First CGI Program #!/usr/bin/python

print "Content-type:text/html\r\n\r\n" print '<html>' print '<head>' print '<title>Hello World - First CGI Program</title>' print '</head>' print '<body>' print '<h2>Hello Word! This is my first CGI program</h2>' print '</body>' print '</html>' If you click hello.py then this produces following output:

Hello World! This is my first CGI program

GET and POST methods GET and POST basically allow information to be sent back to the webserver from a browser (or other HTTP client for that matter). The method attribute of the FORM element specifies the HTTP method used to send the form to the processing agent. This attribute may take two values:

get: With the HTTP "get" method, the form data set is appended to the URI specified by the action attribute (with a question-mark ("?") as separator) and this new URI is sent to the processing agent.

post: With the HTTP "post" method, the form data set is included in the body of the form and sent to the processing agent.

The "get" method should be used when the form is idempotent (i.e., causes no side-effects). Many database searches have no visible side-effects and make ideal applications for the "get" method. If the service associated with the processing of a form causes side effects (for example, if the form modifies a database or subscription to a service), the "post" method should be used. Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire [ISO10646] character set.

.
GET History: Parameters remain in browser history because they are part of the URL POST Parameters are not saved in browser history.

Bookmarked:

Can be bookmarked.

Cannot be bookmarked.

BACK button/resubmit behaviour:

GET requests are re-executed.

The browser usually alerts the user that data will need to be re-submitted.

Encoding type (enctype attribute):

application/x-www-form-urlencoded

multipart/form-data or application/xwww-form-urlencoded Use multipart encoding for binary data.

Parameters:

can send but the parameter data is limited to what we can stuff into the request line (URL). Safest to use less than 2K of parameters, some servers handle up to 64K

Can send parameters, including uploading files, to the server.

Hacked:

Easier to hack for script kiddies

More difficult to hack

Restrictions on form data type:

Yes, only ASCII characters allowed.

No restrictions. Binary data is also allowed.

Security:

GET is less secure compared to POST because data sent is part of the URL. So it's saved in browser history and server logs in plaintext.

POST is a little safer than GET because the parameters are not stored in browser history or in web server logs.

Restrictions on form data length:

Yes, since form data is in the URL and URL length is restricted

No restrictions

Usability:

GET method should not be used when sending passwords or other sensitive information.

POST method used when sending passwords or other sensitive information.

Visibility:

GET method is visible to everyone (it will be displayed in the browser's address bar) and has limits on the amount of information to send.

POST method variables are not displayed in the URL.

Cached:

Can be cached

Not cached

Large variable values:

7607 character maximum size.

8 Mb max size for the POST method.

Apache Web Server (case study) A web server is a piece of software that enables a website to be viewed using HTTP. HTTP (HyperText Transfer Protocol) is the key protocol for the transfer of data on the web. Apache Web Server or the Apache HTTP Server is web server software that delivers pages of your website to viewers browsers. It was originally designed for UNIX environments. Later it was ported to a wide range of operating systems like UNIX, GNU, FreeBSD, Linux, Solaris, Novell NetWare, Mac OS X, Microsoft Windows, OS/2, TPF, and eComStation. Features of an Apache Web Server
o o o o o o o o

Web publishing as static and dynamic contents Secure web interaction(Apache also includes "SSL" and "TLS" support) Virtual hosting Forward proxy server Reverse proxy server rich in Web server features that include CGI, SSL, and support for virtual domains Apache also supports plug-in modules for extensibility Apache is reliable, free, and relatively easy to configure

Advantages of an Apache Web Server There are many advantages of using Apache for end users, developers and web administrators. Some of these are summarized below: o Apache is feature-rich Apache supports and implements the latest protocols and offers a variety of useful features o Apache is customizable Apache has a modular architecture that allows you to build a server that is completely customized based on the requirement o Apache administration is simple The configuration files of Apache are in ASCII and have a simple format. This means that they can easily be edited using any text editor. Since these are transferable, cloning a

server is easy. It also allows you to control your Apache Web Server from the command line, making for convenient remote administration o Apache is extensible Apache is a constantly evolving. Its API and source code is open. This means that if you need a custom module, you could develop it yourself and share it with the Apache development community. Likewise, there is a good chance that a vast majority of modules have already been developed by other developers and easily available online o Apache is efficient Apache's C code is optimized for performance. What this translates into is that an Apache Web Server runs faster and consumes less systems resources than many other servers o Apache is portable across operating systems Apache runs on a wide variety of operating systems, including all variants of UNIX, Windows 9x/NT, MacOS (on PowerPC), and various others o Apache offers stability and reliability Because the Apache source code is open to public, bugs are easily communicated and quickly fixed. Updates follow the bug fix and over a period of time, this has made the Apache web server more stable and very reliable as a web server o Support for Apache The Apache Group comprises of a large number of dedicated users and developers. Many companies who market commercial versions of the Apache web server also offer support.

Web.py (case study) web.py is a very clean and simple web framework / library written in Python to assist in the development of Python web applications Installation is simply a matter of unzipping an archive file, followed by executing a setup script. The framework has no dependencies on external packages or libraries that must be separately downloaded. Web.py includes its own Web server, but it is suitable only for development work. Web.py applications can be deployed to any Web server that supports WSGI (Web Server Gateway Interface). The Web.py website suggests LightTPD or Apache. It also has useful debug features that are automatically enabled.

PLT Web server The PLT Scheme Web Server uses continuations to enable a natural, console-like program development style.

The fastest way to get a servlet running in the Web server is to use the "Insta" language in DrRacket. Enter the following into DrRacket: #lang web-server/insta (define (start req) (response/xexpr `(html (head (title "Hello world!")) (body (p "Hey out there!"))))) And press Run. A Web browser will open up showing your new servlet. This servlet will only be accessible from your local machine. DrRacket has used serve/servlet to start a new server that uses your start function as the servlet. Stateful Servlets and Stateless Servlets describe two ways to write Web applications. Stateful Servlets use the entire Racket language, but their continuations are stored in the Web servers memory. Stateless Servlets use a slightly restricted Racket language, but their continuation can be stored by the Web client or on a Web servers disk. If you can, you want to use Stateless Servlets for the improved scalability.

Module 2. Web Programming in Python, Scheme and Java: Templating

A Web template system describes the software and methodologies used to produce web pages and for deployment on websites and delivery over the Internet. Such systems process web templates, using a template engine.

It is a web publishing tool present in content management systems, software frameworks, HTML editors, and many other contexts.

A web template system is composed of


o o

A template engine: the primary processing element of the system; Content resource: any of various kinds of input data streams, such as from a relational database, XML files, LDAP directory(Lightweight Directory Access Protocol), and other kinds of local or networked data;

Template resource: web templates specified according to a template language; the template and content resources are processed and combined by the template engine to mass-produce web documents.

Kinds of template systems o o Outside server systems Server-side systems

o o o

Edge-side systems Client-Side Systems Distributed systems

Templating in Python o Templating, and in particular Web templating, involves the presentation of information in a form which is often (but not always) intended to be readable, even attractive, to a human audience. o Frequently, templating solutions involve a document (the template) which may look somewhat like the final output but perhaps in a simplified or stylized form, along with some data which must be presented using that template; combining these two things produces the final output which in Web templating is usually (but not always) a Web page of some kind. o Templating Engines There are many, many different HTML/XML templating packages and modules for Python that provide different feature sets and syntaxes. The simplest form of templating engine is that which merely substitutes values into a template in order to produce the final output. string.Template - in the standard library stringtemplate - employs recursion in order to provide support for complicated templating whilst avoiding side-effects Java Templating Engines The following templating engines are accessible or usable via Jython: FreeMarker (with Jython data binding) Java Server Pages, JSP Velocity WebMacro

URL mapping A framework's URL mapping facility is the mechanism by which the framework interprets URLs. Some frameworks, such as Drupal and Django, match the provided URL against predetermined patterns using regular expressions, while some others use URL Rewriting to translate the provided URL into one that the underlying engine will recognize. Another technique is that of graph traversal such as used by Zope, where a URL is decomposed in steps that traverse an object graph (of models and views).

A URL mapping system that uses pattern matching or URL rewriting allows more "friendly URLs" to be used, increasing the simplicity of the site and allowing for better indexing by search engines. For example, a URL that ends with "/page.cgi?cat=science&topic=physics" could be changed to simply "/page/science/physics". This makes the URL easier to read and provides search engines with better information about the structural layout of the site. A graph traversal approach also tends to result in the creation of friendly URLs. A shorter URL such as "/page/science" tends to exist by default as that is simply a shorter form of the longer traversal to "/page/science/physics".

CGI programming in Scheme Web applications can be built in any Scheme by using CGI, and complete web servers can be built in any Scheme with networking capabilities. OpenScheme and RScheme come with built-in support for CGI, incoming and outgoing HTTP, and HTML generation. Java-based Schemes can operate within the Java Servlet framework. JScheme features "Scheme Server Pages" that work in a similar way to Java Server Pages (JSPs). Mod_lisp is an Apache module to easily write web applications in Lisp/Scheme by implementing a TCP/IP-based protocol for communication between the Apache web server and Lisp/Scheme processes. The Scheme FastCGI Proxy allows Scheme code to be run as a fastCGI application under the mod_fcgi module available for the Apache web server. LAML is a Scheme-based set of libraries for server side web programming as well programmatic authoring of complex WWW material.

Introduction to Continuations Scheme supports the definition of arbitrary control structures with continuations. A continuation is a procedure that embodies the remainder of a program at a given point in the program. A continuation may be obtained at any time during the execution of a program. As with other procedures, a continuation is a first-class object and may be invoked at any time after its creation. Whenever it is invoked, the program immediately continues from the point where the continuation was obtained. Continuations allow the implementation of complex control mechanisms including explicit backtracking, multithreading, and co-routines. A continuation is a value that encapsulates a piece of an expression context. The call-withcomposable-continuation function captures the current continuation starting outside the current function call and running up to the nearest enclosing prompt.

The problem is that get-number needs to send an HTML response back for the current connection, and then it must obtain a response through a new connection. That is, somehow it needs to convert the page generated by build-request-page into a query result: (define (get-number label) (define query ... (build-request-page label ...) ...) (number->string (cdr (assq 'number query)))) Continuations let us implement a send/suspend operation that performs exactly that operation. The send/suspend procedure generates a URL that represents the current connections computation, capturing it as a continuation. It passes the generated URL to a procedure that creates the query page; this query page is used as the result of the current connection, and the surrounding computation (i.e., the continuation) is aborted. Finally, send/suspendarranges for a request to the generated URL (in a new connection) to restore the aborted computation. Thus, get-number is implemented as follows: (define (get-number label) (define query ; Generate a URL for the current computation: (send/suspend ; Receive the computation-as-URL here: (lambda (k-url) ; Generate the query-page result for this connection. ; Send the query result to the saved-computation URL: (build-request-page label k-url "")))) ; We arrive here later, in a new connection (string->number (cdr (assq 'number query)))) We still have to implement send/suspend. For that task, we import a library of control operators: (require racket/control) Specifically, we need prompt and abort from racket/control. We use prompt to mark the place where a servlet is started, so that we can abort a computation to that point. Change handle by wrapping an prompt around the call to dispatch: After that we can implement send/suspend. We use call/cc in the guise of let/cc, which captures the current computation up to an enclosing prompt and binds that computation to an identifierk, in this case: Next, we generate a new dispatch tag, and we record the mapping from the tag to k: Finally, we abort the current computation, supplying instead the page that is built by applying the given

mk-page to a URL for the generated tag: (define (handle in out) .... (let ([xexpr (prompt (dispatch (list-ref req 1)))]) ....)) (define (send/suspend mk-page) (let/cc k (define tag (format "k~a" (current-inexact-milliseconds))) (hash-set! dispatch-table tag k) (abort (mk-page (string-append "/" tag)))))

Continuation-based stateful web programming in Scheme

(require web-server/servlet) A stateful servlet should provide the following exports: interface-version : (one-of/c 'v2) This indicates that the servlet is a version two servlet. manager : manager? The manager for the continuations of this servlet. (start initial-request) can-be-response? initial-request : request? This function is called when an instance of this servlet is started. The argument is the HTTP request that initiated the instance. An example version 2 module: #lang racket (require web-server/http web-server/managers/none) (provide interface-version manager start) (define interface-version 'v2) (define manager (create-none-manager (lambda (req) (response/xexpr `(html (head (title "No Continuations Here!"))

(body (h1 "No Continuations Here!")))))))

(define (start req) (response/xexpr `(html (head (title "Hello World!")) (body (h1 "Hi Mom!"))))) These servlets have an extensive API available to them: net/url, web-server/http, webserver/http/bindings, web-server/servlet/servlet-structs, web-server/servlet/web, webserver/servlet/web-cells, and web-server/dispatch

Web applications in Java (servlets),

Web applications in JSP JavaServer Pages (JSP) is a technology for developing web pages that support dynamic content which helps developers insert java code in HTML pages by making use of special JSP tags, most of which start with <% and end with %>. A JavaServer Pages component is a type of Java servlet that is designed to fulfill the role of a user interface for a Java web application. Web developers write JSPs as text files that combine HTML or XHTML code, XML elements, and embedded JSP actions and commands. Using JSP, you can collect input from users through web page forms, present records from a database or another source, and create web pages dynamically. JSP tags can be used for a variety of purposes, such as retrieving information from a database or registering user preferences, accessing JavaBeans components, passing control between pages and sharing information between requests, pages etc. Why Use JSP? JavaServer Pages often serve the same purpose as programs implemented using the Common Gateway Interface (CGI). But JSP offer several advantages in comparison with the CGI.

Performance is significantly better because JSP allows embedding Dynamic Elements in HTML Pages itself instead of having a separate CGI files.

JSP are always compiled before it's processed by the server unlike CGI/Perl which requires the server to load an interpreter and the target script each time the page is requested.

JavaServer Pages are built on top of the Java Servlets API, so like Servlets, JSP also has access to all the powerful Enterprise Java APIs, including JDBC, JNDI, EJB, JAXP etc.

JSP pages can be used in combination with servlets that handle the business logic, the model supported by Java servlet template engines.

Finally, JSP is an integral part of J2EE, a complete platform for enterprise class applications. This means that JSP can play a part in the simplest applications to the most complex and demanding. The web server needs a JSP engine ie. container to process JSP pages. The JSP container is responsible for intercepting requests for JSP pages. {{This tutorial makes use of Apache which has built-in JSP container to support JSP pages development.}} A JSP container works with the Web server to provide the runtime environment and other services a JSP needs. It knows how to understand the special elements that are part of JSPs. Following diagram shows the position of JSP container and JSP files in a Web Application.

JSP Processing: The following steps explain how the web server creates the web page using JSP:

As with a normal page, your browser sends an HTTP request to the web server. The web server recognizes that the HTTP request is for a JSP page and forwards it to a JSP engine. This is done by using the URL or JSP page which ends with .jsp instead of .html.

The JSP engine loads the JSP page from disk and converts it into a servlet content. This conversion is very simple in which all template text is converted to println( ) statements and all JSP elements are converted to Java code that implements the corresponding dynamic behavior of the page.

The JSP engine compiles the servlet into an executable class and forwards the original request to a servlet engine.

A part of the web server called the servlet engine loads the Servlet class and executes it. During execution, the servlet produces an output in HTML format, which the servlet engine passes to the web server inside an HTTP response.

The web server forwards the HTTP response to your browser in terms of static HTML content.

Finally web browser handles the dynamically generated HTML page inside the HTTP response exactly as if it were a static page.

All the above mentioned steps can be shown below in the following diagram:

Typically, the JSP engine checks to see whether a servlet for a JSP file already exists and whether the modification date on the JSP is older than the servlet. If the JSP is older than its generated servlet, the JSP container assumes that the JSP hasn't changed and that the generated servlet still matches the JSP's contents. This makes the process more efficient than with other scripting languages (such as PHP) and therefore faster. So in a way, a JSP page is really just another way to write a servlet without having to be a Java programming wiz. Except for the translation phase, a JSP page is handled exactly like a regular servlet. A JSP life cycle can be defined as the entire process from its creation till the destruction which is similar to a servlet life cycle with an additional step which is required to compile a JSP into servlet. The following are the paths followed by a JSP

Compilation Initialization Execution Cleanup

The three major phases of JSP life cycle are very similar to Servlet Life Cycle and they are as follows:

(1) JSP Compilation: When a browser asks for a JSP, the JSP engine first checks to see whether it needs to compile the page. If the page has never been compiled, or if the JSP has been modified since it was last compiled, the JSP engine compiles the page. The compilation process involves three steps: 1. Parsing the JSP. 2. Turning the JSP into a servlet. 3. Compiling the servlet. (2) JSP Initialization: When a container loads a JSP it invokes the jspInit() method before servicing any requests. If you need to perform JSP-specific initialization, override the jspInit() method: public void jspInit(){ // Initialization code... } Typically initialization is performed only once and as with the servlet init method, you generally initialize database connections, open files, and create lookup tables in the jspInit method. (3) JSP Execution: This phase of the JSP life cycle represents all interactions with requests until the JSP is destroyed. Whenever a browser requests a JSP and the page has been loaded and initialized, the JSP engine invokes the _jspService() method in the JSP. The _jspService() method takes an HttpServletRequest and an HttpServletResponse as its parameters as follows:

void _jspService(HttpServletRequest request, HttpServletResponse response) { // Service handling code... } The _jspService() method of a JSP is invoked once per a request and is responsible for generating the response for that request and this method is also responsible for generating responses to all seven of the HTTP methods ie. GET, POST, DELETE etc. (4) JSP Cleanup: The destruction phase of the JSP life cycle represents when a JSP is being removed from use by a container. The jspDestroy() method is the JSP equivalent of the destroy method for servlets. Override jspDestroy when you need to perform any cleanup, such as releasing database connections or closing open files. The jspDestroy() method has the following form: public void jspDestroy() { // Your cleanup code goes here. }

Using Java Beans with JSP

A JavaBean is a specially constructed Java class written in Java and coded according to the JavaBeans API specifications. Following are the unique characteristics that distinguish a JavaBean from other Java classes:

It provides a default, no-argument constructor. It should be serializable and implement the Serializable interface. It may have a number of properties which can be read or written. It may have a number of "getter" and "setter" methods for the properties.

JavaBeans Properties: A JavaBean property is a named attribute that can be accessed by the user of the object. The attribute can be of any Java data type, including classes that you define. A JavaBean property may be read, write, read only, or write only. JavaBean properties are accessed through two methods in the JavaBean's implementation class:

Method getPropertyName()

Description For example, if property name is firstName, your method name would be getFirstName() to read that property. This method is called accessor.

setPropertyName()

For example, if property name is firstName, your method name would be setFirstName() to write that property. This method is called mutator.

A read-only attribute will have only a getPropertyName() method, and a write-only attribute will have only a setPropertyName() method. JavaBeans Example: Consider a student class with few properties: package com.tutorialspoint;

public class StudentsBean implements java.io.Serializable { private String firstName = null; private String lastName = null; private int age = 0;

public StudentsBean() { } public String getFirstName(){ return firstName; } public String getLastName(){ return lastName; } public int getAge(){ return age; } public void setFirstName(String firstName){ this.firstName = firstName; } public void setLastName(String lastName){ this.lastName = lastName; }

public void setAge(Integer age){ this.age = age; } } Accessing JavaBeans: The useBean action declares a JavaBean for use in a JSP. Once declared, the bean becomes a scripting variable that can be accessed by both scripting elements and other custom tags used in the JSP. The full syntax for the useBean tag is as follows: <jsp:useBean id="bean's name" scope="bean's scope" typeSpec/> Here values for the scope attribute could be page, request, session or application based on your requirement. The value of the id attribute may be any value as a long as it is a unique name among other useBean declarations in the same JSP. Following example shows its simple usage: <html> <head> <title>useBean Example</title> </head> <body>

<jsp:useBean id="date" class="java.util.Date" /> <p>The date/time is <%= date %>

</body> </html> This would produce following result: The date/time is Thu Sep 30 11:18:11 GST 2010 Accessing JavaBeans Properties: Along with <jsp:useBean...>, you can use <jsp:getProperty/> action to access get methods and <jsp:setProperty/> action to access set methods. Here is the full syntax: <jsp:useBean id="id" class="bean's class" scope="bean's scope"> <jsp:setProperty name="bean's id" property="property name" value="value"/> <jsp:getProperty name="bean's id" property="property name"/> ...........

</jsp:useBean> The name attribute references the id of a JavaBean previously introduced to the JSP by the useBean action. The property attribute is the name of the get or set methods that should be invoked. Following is a simple example to access the data using above syntax: <html> <head> <title>get and set properties Example</title> </head> <body>

<jsp:useBean id="students" class="com.tutorialspoint.StudentsBean"> <jsp:setProperty name="students" property="firstName" value="Zara"/> <jsp:setProperty name="students" property="lastName" value="Ali"/> <jsp:setProperty name="students" property="age" value="10"/> </jsp:useBean>

<p>Student First Name: <jsp:getProperty name="students" property="firstName"/> </p> <p>Student Last Name: <jsp:getProperty name="students" property="lastName"/> </p> <p>Student Age: <jsp:getProperty name="students" property="age"/> </p>

</body> </html> Let us make StudentsBean.class available in CLASSPATH and try to access above JSP. This would produce following result: Student First Name: Zara

Student Last Name: Ali

Student Age: 10

Module 3: Database connectivity and Data Abstractions Database connectivity

Python The Open Database Connectivity (ODBC) API standard allows transparent connections with any database that supports the interface. This includes most popular databases, such as PostgreSQL or Microsoft Access. The Python standard for database interfaces is the Python DB-API. Most Python database interfaces adhere to this standard. Python has support for working with databases via a simple API. Modules included with Python include modules for SQLite and Berkeley DB. Modules for MySQL , PostgreSQL , FirebirdSQL and others are available as third-party modules. An Example with MySQL would look like this: import MySQLdb db = MySQLdb.connect("host machine", "dbuser", "password", "dbname") cursor = db.cursor() query = """SELECT * FROM sampletable""" lines = cursor.execute(query) data = cursor.fetchall() db.close() In order to make the initialization of the connection easier, a configuration file can be used: import MySQLdb db = MySQLdb.connect(read_default_file="~/.my.cnf") An example with sqlite is very similar to the one above and the cursor provides many of the same functionalities. import sqlite3 db = sqlite3.connect("/path/to/file") cursor = db.cursor() query = """SELECT * FROM sampletable""" lines = cursor.execute(query)

data = cursor.fetchall() db.close() There are several wrappers that provide improved or simplified interfaces to SQL databases.

SQLObject: ORM SQLAlchemy: SQL Toolkit and ORM

>>> Java, Scheme

JDBC The JDBC API is a Java API that can access any kind of tabular data, especially data stored in a Relational Database. What is JDBC? JDBC stands for Java Database Connectivity, which is a standard Java API for databaseindependent connectivity between the Java programming language and a wide range of databases. The JDBC library includes APIs for each of the tasks commonly associated with database usage:

Making a connection to a database Creating SQL or MySQL statements Executing that SQL or MySQL queries in the database Viewing & Modifying the resulting records

Fundamentally, JDBC is a specification that provides a complete set of interfaces that allows for portable access to an underlying database. Java can be used to write different types of executables, such as:

Java Applications Java Applets Java Servlets Java ServerPages (JSPs) Enterprise JavaBeans (EJBs)

All of these different executables are able to use a JDBC driver to access a database and take advantage of the stored data. JDBC provides the same capabilities as ODBC, allowing Java programs to contain databaseindependent code. JDBC Architecture: The JDBC API supports both two-tier and three-tier processing models for database access but in general JDBC Architecture consists of two layers: 1. JDBC API: This provides the application-to-JDBC Manager connection.

2. JDBC Driver API: This supports the JDBC Manager-to-Driver Connection. The JDBC API uses a driver manager and database-specific drivers to provide transparent connectivity to heterogeneous databases. The JDBC driver manager ensures that the correct driver is used to access each data source. The driver manager is capable of supporting multiple concurrent drivers connected to multiple heterogeneous databases. Following is the architectural diagram, which shows the location of the driver manager with respect to the JDBC drivers and the Java application:

Common JDBC Components: The JDBC API provides the following interfaces and classes:

DriverManager: This class manages a list of database drivers. Matches connection requests from the java application with the proper database driver using communication subprotocol. The first driver that recognizes a certain subprotocol under JDBC will be used to establish a database Connection.

Driver: This interface handles the communications with the database server. You will interact directly with Driver objects very rarely. Instead, you use DriverManager objects, which manage objects of this type. It also abstracts the details associated with working with Driver objects.

Connection: This interface with all methods for contacting a database. The connection object represents communication context, i.e., all communication with database is through connection object only.

Statement: You use objects created from this interface to submit the SQL statements to the database. Some derived interfaces accept parameters in addition to executing stored procedures.

ResultSet: These objects hold data retrieved from a database after you execute an SQL query using Statement objects. It acts as an iterator to allow you to move through its data.

SQLException: This class handles any errors that occur in a database application.

Creating JDBC Application: There are following six steps involved in building a JDBC application: 1. Import the packages . Requires that you include the packages containing the JDBC classes needed for database programming. Most often, using import java.sql.* will suffice as follows. //STEP 1. Import required packages import java.sql.*; 2. Register the JDBC driver . Requires that you initialize a driver so you can open a communications channel with the database. Following is the code snippet to achieve this: //STEP 2: Register JDBC driver Class.forName("com.mysql.jdbc.Driver"); 3. Open a connection . Requires using the DriverManager.getConnection() method to create a Connection object, which represents a physical connection with the database. //STEP 3: Open a connection // Database credentials static final String USER = "username"; static final String PASS = "password"; System.out.println("Connecting to database..."); conn = DriverManager.getConnection(DB_URL,USER,PASS); 4. Execute a query . Requires using an object of type Statement for building and submitting an SQL statement to the database. //STEP 4: Execute a query System.out.println("Creating statement..."); stmt = conn.createStatement(); String sql; sql = "SELECT id, first, last, age FROM Employees"; ResultSet rs = stmt.executeQuery(sql); 5. Extract data from result set . Requires that you use the appropriate ResultSet.getXXX() method to retrieve the data from the result set. //STEP 5: Extract data from result set

while(rs.next()) { //Retrieve by column name int id = rs.getInt("id"); int age = rs.getInt("age"); String first = rs.getString("first"); String last = rs.getString("last"); //Display values System.out.print("ID: " + id); System.out.print(", Age: " + age); System.out.print(", First: " + first); System.out.println(", Last: " + last); } 6. Clean up the environment . Requires explicitly closing all database resources versus relying on the JVM's garbage collection. //STEP 6: Clean-up environment rs.close(); stmt.close(); conn.close();

SQL Alchemy in Python The SQLAlchemy SQL Toolkit and Object Relational Mapper is a comprehensive set of tools for working with databases and Python. It has several distinct areas of functionality which can be used individually or combined together. Its major components are illustrated in below, with component dependencies organized into layers:

Above, the two most significant front-facing portions of SQLAlchemy are the Object Relational Mapper and theSQL Expression Language. SQL Expressions can be used independently of the ORM. When using the ORM, the SQL Expression language remains part of the public facing API as it is used within object-relational configurations and queries.

Key Features of SQLAlchemy SQLAlchemy consists of two distinct components, known as the Core and the ORM. The Object Relational Mapper is then an optional package which builds upon the Core Mature, High Performing Architecture The Unit Of Work system, a central part of SQLAlchemy's Object Relational Mapper (ORM), organizes pending insert/update/delete operations into queues and flushes them all in one batch. Function-based query construction Modular and Extensible Separate mapping and class design Eager-loading and caching of related objects and collections Composite (multiple-column) primary keys Pre- and post-processing of data

Example:# The first step is to import sqlalchemy module from sqlalchemy import * # Next step is to open a connection to the database you'll be using. This is done by creating an #SQLEngine object, which knows how to talk to one particular type of database (SQLite, PostgreSQL, #Firebird, MySQL, Oracle...). create_engine() function takes a single parameter that's a URI, #of the form "engine://user:password@host:port/database". db = create_engine('sqlite:///tutorial.db')

#Metadata object that will manage Table definitions. metadata = MetaData(bind=db) #create a users table users = Table('users', metadata, Column('user_id', Integer, primary_key=True), Column('name', String(40)), Column('age', Integer), Column('password', String), )

metadata.create_all() #In SQLAlchemy create an "SQL statement object", build the SQL query you want, and call its #execute() method. connection = db.connect() connection.execute(users.insert(), [{'name': 'John', 'age': 42}, {'name': 'Susan', 'age': 57}, {'name': 'Carl', 'age': 33} ]) s = users.select() rs = s.execute()

row = rs.fetchone() print 'Id:', row[0] print 'Name:', row['name'] print 'Age:', row.age print 'Password:', row[users.c.password]

for row in rs: print row.name, 'is', row.age, 'years old' ============================= ORM JDBC stands for Java Database Connectivity and provides a set of Java API for accessing the relational databases from Java program. These Java APIs enables Java programs to execute SQL statements and interact with any SQL compliant database.

JDBC provides a flexible architecture to write a database independent application that can run on different platforms and interact with different DBMS without any modification.

Pros and Cons of JDBC


Pros of JDBC Cons of JDBC

Clean and simple SQL processing Good performance with large data Very good for small applications Simple syntax so easy to learn

Complex if it is used in large projects Large programming overhead No encapsulation Hard to implement MVC concept Query is DBMS specific

Why Object Relational Mapping (ORM)?


When we work with an object-oriented systems, there's a mismatch between the object model and the relational database. RDBMSs represent data in a tabular format whereas object-oriented languages, such as Java or C# represent it as an interconnected graph of objects. Consider the following Java Class with proper constructors and associated public function:
public class Employee { private int id; private String first_name; private String last_name; private int salary; public Employee() {} public Employee(String fname, String lname, int salary) { this.first_name = fname; this.last_name = lname; this.salary = salary; } public int getId() { return id; } public String getFirstName() { return first_name; } public String getLastName() { return last_name; } public int getSalary() { return salary; } }

Consider above objects need to be stored and retrieved into the following RDBMS table:
create table EMPLOYEE ( id INT NOT NULL auto_increment, first_name VARCHAR(20) default NULL, last_name VARCHAR(20) default NULL, salary INT default NULL,

PRIMARY KEY (id) );

First problem, what if we need to modify the design of our database after having developed few pages or our application? Second, Loading and storing objects in a relational database exposes us to the following five mismatch problems.
Mismatch Granularity Description Sometimes you will have an object model which has more classes than the number of corresponding tables in the database. RDBMSs do not define anything similar to Inheritance which is a natural paradigm in object-oriented programming languages. A RDBMS defines exactly one notion of 'sameness': the primary key. Java, however, defines both object identity (a==b) and object equality (a.equals(b)). Object-oriented languages represent associations using object references where as am RDBMS represents an association as a foreign key column. The ways you access objects in Java and in a RDBMS are fundamentally different.

Inheritance

Identity

Associations

Navigation

The Object-Relational Mapping (ORM) is the solution to handle all the above impedance mismatches.

What is ORM? ORM stands for Object-Relational Mapping (ORM) is a programming technique for converting data between relational databases and object oriented programming languages such as Java, C# etc. An ORM system has following advantages over plain JDBC
S.N. 1 2 3 4 5 6 Advantages Lets business code access objects rather than DB tables. Hides details of SQL queries from OO logic. Based on JDBC 'under the hood' No need to deal with the database implementation. Entities based on business concepts rather than database structure. Transaction management and automatic key generation.

Fast development of application.

An ORM solution consists of the following four entities:


S.N. 1 2 3 4 Solutions An API to perform basic CRUD operations on objects of persistent classes. A language or API to specify queries that refer to classes and properties of classes. A configurable facility for specifying mapping metadata. A technique to interact with transactional objects to perform dirty checking, lazy association fetching, and other optimization functions.

Java ORM Frameworks: There are several persistent frameworks and ORM options in Java. A persistent framework is an ORM service that stores and retrieves objects into a relational database.

Enterprise JavaBeans Entity Beans Java Data Objects Castor TopLink Spring DAO Hibernate

=======================================

Hibernate in Java Hibernate is an Object-Relational Mapping(ORM) solution for JAVA and it raised as an open source persistent framework created by Gavin King in 2001. It is a powerful, high performance Object-Relational Persistence and Query service for any Java Application. Hibernate maps Java classes to database tables and from Java data types to SQL data types and relieve the developer from 95% of common data persistence related programming tasks. Hibernate sits between traditional Java objects and database server to handle all the work in persisting those objects based on the appropriate O/R mechanisms and patterns.

Hibernate Advantages:

Hibernate takes care of mapping Java classes to database tables using XML files and without writing any line of code.

Provides simple APIs for storing and retrieving Java objects directly to and from the database. If there is change in Database or in any table then it is only need to change XML file properties. Abstract away the unfamiliar SQL types and provide us to work around familiar Java Objects. Hibernate does not require an application server to operate. Manipulates Complex associations of objects of your database. Minimize database access with smart fetching strategies. Provides Simple querying of data.

Supported Databases: Hibernate supports almost all the major RDBMS like

HSQL Database Engine DB2/NT MySQL PostgreSQL FrontBase Oracle Microsoft SQL Server Database Sybase SQL Server Informix Dynamic Server

Supported Technologies: Hibernate supports a variety of other technologies, including the following:

XDoclet Spring J2EE Eclipse plug-ins Maven

Hibernate Architecture Fig. Hibernate Application Architecture

Hibernate uses various existing Java APIs, like JDBC, Java Transaction API(JTA), and Java Naming and Directory Interface (JNDI). Following section gives brief description of each of the class objects involved in Hibernate Application Architecture. o Configuration Object: The Configuration object is the first Hibernate object you create in any Hibernate application and usually created only once during application initialization. It represents a configuration or properties file required by the Hibernate. The Configuration object provides two keys components: 1. Database Connection: This is handled through one or more configuration files supported by Hibernate. These files are hibernate.properties and hibernate.cfg.xml. 2. Class Mapping Setup This component creates the connection between the Java classes and database tables. o SessionFactory Object: Configuration object is used to create a SessionFactory object which inturn configures Hibernate for the application using the supplied configuration file and allows for a Session object to be instantiated. The SessionFactory is a thread safe object and used by all the threads of an application. The SessionFactory is is heavyweight object so usually it is created during application start up and kept for later use. You would need one SessionFactory object per database using a separate

configuration file. So if you are using multiple databases then you would have to create multiple SessionFactory objects. o Session Object: A Session is used to get a physical connection with a database. The Session object is lightweight and designed to be instantiated each time an interaction is needed with the database. Persistent objects are saved and retrieved through a Session object. The session objects should not be kept open for a long time because they are not usually thread safe and they should be created and destroyed them as needed. o Transaction Object: A Transaction represents a unit of work with the database and most of the RDBMS supports transaction functionality. Transactions in Hibernate are handled by an underlying transaction manager and transaction (from JDBC or JTA). This is an optional object and Hibernate applications may choose not to use this interface, instead managing transactions in their own application code. o Query Object: Query objects use SQL or Hibernate Query Language (HQL) string to retrieve data from the database and create objects. A Query instance is used to bind query parameters, limit the number of results returned by the query, and finally to execute the query. o Criteria Object: Criteria object are used to create and execute object oriented criteria queries to retrieve objects.

Database abstraction layer in Scheme A database abstraction layer is an application programming interface which unifies the communication between a computer application and databases such as SQL Server, DB2, MySQL, PostgreSQL, Oracle or SQLite. Traditionally, all database vendors provide their own interface tailored to their products which leaves it to the application programmer to implement code for all database interfaces he or she would like to support. Database abstraction layers reduce the amount of work by providing a consistent API to the developer and hide the database specifics behind this interface as much as possible. A Scheme-database interface via scripting of a SQL user front-end. A major procedure: DB1:fold-left PROC INITIAL-SEED QUERY-OBJECT A QUERY-OBJECT (which in this implementation is a list of fragments that make a SQL statement, in the reverse order -- without the terminating semi-colon) is submitted to the database, using the default database connection.

PROC is a procedure: SEED COL COL ...The procedure PROC takes 1+n arguments wheren is the number of columns in the the table returned by the query. The procedure PROC must return two values:CONTINUE? NEW-SEED The query is executed, and the PROC is applied to each returned row in order. The first invocation of PROCreceives INITIAL-SEED as its first argument. Each following invocation of PROC receives as the first argument the NEW-SEED result of the previous invocation of PROC. The CONTINUE? result ofPROC is an early termination flag. If that flag is returned as #f, any further applications ofPROC are skipped and DB1:fold-left finishes. The function DB1:fold-left returns NEW-SEEDproduced by the last invocation of PROC. If the query yielded no rows, DB1:fold-left returns theINITIAL-SEED. Thus DB1:fold-left is identical to the left fold over a sequence, modulo the early termination. The interface defines an S-expression-based ``variant'' of SQL to construct a QUERY-OBJECT. All of SQL92 is supported. There are a few minor variants of the above procedure, optimized for common particular cases: DB1:for-singleton,DB1:assoc-val, DB:imperative-stmt. The code can support pooling of database connections. The source code completely explains the interfaces.

Module 4: Communicating Web applications & RIA Screen Scraping, API for communication:

REST REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a stateless, client-server, cacheable communications protocol -- and in virtually all cases, the HTTP protocol is used. REST is an architecture style for designing networked applications. The idea is that, rather than using complex mechanisms such as CORBA, RPC or SOAP to connect between machines, simple HTTP is used to make calls between machines. In many ways, the World Wide Web itself, based on HTTP, can be viewed as a REST-based architecture. RESTful applications use HTTP requests to post data (create and/or update), read data (e.g., make queries), and delete data. Thus, REST uses HTTP for all four CRUD (Create/Read/Update/Delete) operations. REST is a lightweight alternative to mechanisms like RPC (Remote Procedure Calls) and Web Services (SOAP, WSDL, et al.). A REST service is:
o o

Platform-independent (you don't care if the server is Unix, the client is a Mac, or anything else), Language-independent (C# can talk to Java, etc.),

o o

Standards-based (runs on top of HTTP), and Can easily be used in the presence of firewalls.

Like Web Services, REST offers no built-in security features, encryption, session management, QoS guarantees, etc. But also as with Web Services, these can be added by building on top of HTTP: o For security, username/password tokens are often used. o For encryption, REST can be used on top of HTTPS (secure sockets).... etc. One thing that is not part of a good REST design is cookies: The "ST" in "REST" stands for "State Transfer", and indeed, in a good REST design operations are self-contained, and each request carries with it (transfers) all the information (state) that the server needs in order to complete it. Despite being simple, REST is fully-featured; there's basically nothing you can do in Web Services that can't be done with a RESTful architecture. REST is not a "standard". REST Architecture Components o Key components of a REST architecture: - Resources, which are identified by logical URLs. Both state and functionality are represented using resources. - A web of resources, meaning that a single resource should not be overwhelmingly large and contain too fine-grained details. Whenever relevant, a resource should contain links to additional information -- just as in web pages. - The system has a client-server, but of course one component's server can be another component's client. - There is no connection state; interaction is stateless (although the servers and resources can of course be stateful). Each new request should carry all the information required to complete it, and must not rely on previous interactions with the same client. - Resources should be cachable whenever possible (with an expiration date/time). The protocol must allow the server to explicitly specify which resources may be cached, and for how long. - Since HTTP is universally used as the REST protocol, the HTTP cache-control headers are used for this purpose. - Clients must respect the server's cache specification for each resource. - Proxy servers can be used as part of the architecture, to improve performance and scalability. Any standard HTTP proxy can be used. - Note that your application can use REST services (as a client) without being a REST architecture by itself; e.g., a single-machine, non-REST program can access 3rd-party REST services. Key goals of REST include: o Scalability of component interactions o Generality of interfaces o Independent deployment of components o Intermediary components to reduce latency, enforce security and encapsulate legacy systems Constraints

The REST architectural style describes the following six constraints applied to the architecture, while leaving the implementation of the individual components free to design:

Clientserver A uniform interface separates clients from servers. This separation of concerns means that, for example, clients are not concerned with data storage, which remains internal to each server, so that the portability of client code is improved. Servers are not concerned with the user interface or user state, so that servers can be simpler and more scalable. Servers and clients may also be replaced and developed independently, as long as the interface between them is not altered. Stateless The clientserver communication is further constrained by no client context being stored on the server between requests. Each request from any client contains all of the information necessary to service the request, and any session state is held in the client. Cacheable As on the World Wide Web, clients can cache responses. Responses must therefore, implicitly or explicitly, define themselves as cacheable, or not, to prevent clients reusing stale or inappropriate data in response to further requests. Well-managed caching partially or completely eliminates some clientserver interactions, further improving scalability and performance. Layered system A client cannot ordinarily tell whether it is connected directly to the end server, or to an intermediary along the way. Intermediary servers may improve system scalability by enabling load-balancing and by providing shared caches. They may also enforce security policies. Code on demand (optional) Servers are able temporarily to extend or customize the functionality of a client by the transfer of executable code. Examples of this may include compiled components such as Java applets and client-side scripts such as JavaScript. Uniform interface The uniform interface between clients and servers, discussed below, simplifies and decouples the architecture, which enables each part to evolve independently. The four guiding principles of this interface are detailed below.

The only optional constraint of REST architecture is code on demand. If a service violates any other constraint, it cannot strictly be considered RESTful. Complying with these constraints, and thus conforming to the REST architectural style enables any kind of distributed hypermedia system to have desirable emergent properties, such as performance, scalability, simplicity, modifiability, visibility, portability, and reliability.

http://rest.elkstein.org/

>>>web services, SOAP SOAP is a simple XML-based protocol to let applications exchange information over HTTP or in other words SOAP is a protocol for accessing a Web Service.

SOAP stands for Simple Object Access Protocol SOAP is a communication protocol SOAP is for communication between applications SOAP is a format for sending messages SOAP communicates via Internet SOAP is platform independent SOAP is language independent SOAP is based on XML SOAP is simple and extensible SOAP allows you to get around firewalls SOAP is a W3C recommendation

Why SOAP?

It is important for application development to allow Internet communication between programs. Today's applications communicate using Remote Procedure Calls (RPC) between objects like DCOM and CORBA, but HTTP was not designed for this. RPC represents a compatibility and security problem; firewalls and proxy servers will normally block this kind of traffic. A better way to communicate between applications is over HTTP, because HTTP is supported by all Internet browsers and servers. SOAP was created to accomplish this. SOAP provides a way to communicate between applications running on different operating systems, with different technologies and programming languages.

SOAP Building Blocks A SOAP message is an ordinary XML document containing the following elements:

An Envelope element that identifies the XML document as a SOAP message A Header element that contains header information A Body element that contains call and response information A Fault element containing errors and status information

Syntax Rules Here are some important syntax rules:


A SOAP message MUST be encoded using XML A SOAP message MUST use the SOAP Envelope namespace A SOAP message MUST use the SOAP Encoding namespace A SOAP message must NOT contain a DTD reference A SOAP message must NOT contain XML Processing Instructions

Skeleton SOAP Message

< ?xml version="1.0"?> < soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> < soap:Header>

... < /soap:Header> < soap:Body> ... <soap:Fault> ... </soap:Fault> < /soap:Body> < /soap:Envelope>

>>>>DOM and XML parsing: XML stands for eXtensible Markup Language. XML is designed to transport and store data. What is XML?

XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define your own tags XML is designed to be self-descriptive XML is a W3C Recommendation

XML Document Example <?xml version="1.0"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Tidy HTML Tidy is a computer program and a library whose purpose is to fix invalid HTML and to improve the layout and indent style of the resulting mark-up. It was developed by Dave Raggett of World Wide Web Consortium (W3C).

Its source code is written in ANSI C for maximum portability and precompiled binaries are available for a variety of platforms. New versions are available only under CVS (Concurrent Versions System, a version control system), not binary.

Examples of bad HTML it is able to fix: o o o o o Missing or mismatched end tags, mixed up tags Adding missing items (some tags, quotes, ...) Reporting proprietary HTML extensions Change layout of markup to predefined style Transform characters from some encodings into HTML entities

Xquery XQuery was designed to query XML data.

XQuery is built on XPath expressions XQuery is supported by all major databases XQuery is a W3C Recommendation

XQuery can be used to: o o o o o Extract information to use in a Web Service Generate summary reports Transform XML data to XHTML Search Web documents for relevant information

Query String
part of a Uniform Resource Locator (URL) that contains data to be passed to web applications such as CGI programs. permits data to be passed from the HTTP client (often a web browser) to the program which generates the web page. A typical URL containing a query string is as follows:

http://server/path/program?query_string HTML defines three ways a web browser can generate the query string: o a web form via the <form>...</form> element

The main use of query strings is to contain the content of an HTML form, also known as web form. In particular, when a form containing the fields field1, field2, field3 is submitted, the content of the fields is encoded as a query string as follows: field1=value1&field2=value2&field3=value3... The query string is composed of a series of field-value pairs. Within each pair, the field name and value are separated by an equals sign. The equals sign may be omitted if the value is an empty string. The series of pairs is separated by the ampersand, '&' (or semicolon, ';' for URLs embedded in HTML and not generated by a <form>...</form>).

While there is no definitive standard, most web frameworks allow multiple values to be associated with a single field.

field1=value1&field1=value2&field1=value3...

For each field of the form, the query string contains a pair field=value. Web forms may include fields that are not visible to the user; these fields are included in the query string when the form is submitted o o a server-side image map via the ismap attribute on the <img> element with a <a><img ismap></a> construction an indexed search via the now deprecated <isindex> element

RIA A Rich Internet Application (RIA) is a Web application that has many of the characteristics of desktop application software, typically delivered by way of a site-specific browser, a browser plug-in, an independent sandbox, extensive use of JavaScript, or a virtual machine. Adobe Flash, JavaFX, and Microsoft Silverlight are currently the three most common platforms, with desktop browser penetration rates around 96%, 76%, and 66% respectively. Users generally need to install a software framework using the computer's operating system before launching the application, which typically downloads, updates, verifies and executes the RIA. This is the main differentiator from HTML5/JavaScript-based alternatives like Ajax that use built-in browser functionality to implement comparable interfaces. RIAs dominate in online gaming as well as applications that require access to video capture (with the notable exception of Gmail, which uses its own task-specific browser plug-in).

CSS

CSS stands for Cascading Style Sheets Styles define how to display HTML elements Styles were added to HTML 4.0 to solve a problem External Style Sheets can save a lot of work External Style Sheets are stored in CSS files

CSS defines HOW HTML elements are to be displayed. Styles are normally saved in external .css files. External style sheets enable you to change the appearance and layout of all the pages in a Web site, just by editing one single file! HTML was never intended to contain tags for formatting a document. HTML was intended to define the content of a document, like: <h1>This is a heading</h1> <p>This is a paragraph.</p> When tags like <font>, and color attributes were added to the HTML 3.2 specification, it started a nightmare for web developers. Development of large web sites, where fonts and color information were added to every single page, became a long and expensive process. To solve this problem, the World Wide Web Consortium (W3C) created CSS.

CSS Syntax A CSS rule has two main parts: a selector, and one or more declarations:

The selector is normally the HTML element you want to style. Each declaration consists of a property and a value. The property is the style attribute you want to change. Each property has a value. A CSS declaration always ends with a semicolon, and declaration groups are surrounded by curly brackets: Example : p { color:red;

text-align:center; }

In addition to setting a style for a HTML element, CSS allows you to specify your own selectors called "id" and "class". The id selector is used to specify a style for a single, unique element. The id selector uses the id attribute of the HTML element, and is defined with a "#". The style rule below will be applied to the element with id="para1": #para1 { text-align:center; color:red; }

The class selector is used to specify a style for a group of elements. Unlike the id selector, the class selector is most often used on several elements. This allows you to set a particular style for many HTML elements with the same class. The class selector uses the HTML class attribute, and is defined with a "." In the example below, all HTML elements with class="center" will be center-aligned:

.center {text-align:center;} You can also specify that only specific HTML elements should be affected by a class. In the example below, all p elements with class="center" will be center-aligned:
p.center {text-align:center;}

There are three ways of inserting a style sheet:

External style sheet An external style sheet is ideal when the style is applied to many pages. With an external style sheet, you can change the look of an entire Web site by changing one file. Each page must link to the style sheet using the <link> tag. The <link> tag goes inside the head section. An external style sheet can be written in any text editor. The file should not contain any html tags. Your style sheet should be saved with a .css extension.
<head> < link rel="stylesheet" type="text/css" href="mystyle.css"> < /head>

Internal style sheet

An internal style sheet should be used when a single document has a unique style. You define internal styles in the head section of an HTML page, by using the <style> tag, like this:
<head> < style> hr {color:sienna;} p {margin-left:20px;} body {background-image:url("images/back40.gif");} < /style> < /head>

Inline style Here you use the style attribute in the relevant tag. The style attribute can contain any CSS property.
<p style="color:sienna;margin-left:20px">This is a paragraph.</p>

Javascript JavaScript is:


JavaScript is a lightweight, interpreted programming language Designed for creating network-centric applications Complementary to and integrated with Java Complementary to and integrated with HTML Open and cross-platform

Client-side JavaScript is the most common form of the language. The script should be included in or referenced by an HTML document for the code to be interpreted by the browser. It means that a web page need no longer be static HTML, but can include programs that interact with the user, control the browser, and dynamically create HTML content. The JavaScript client-side mechanism features many advantages over traditional CGI server-side scripts. For example, you might use JavaScript to check if the user has entered a valid e-mail address in a form field. The JavaScript code is executed when the user submits the form, and only if all the entries are valid they would be submitted to the Web Server. JavaScript can be used to trap user-initiated events such as button clicks, link navigation, and other actions that the user explicitly or implicitly initiates. Advantages of JavaScript: The merits of using JavaScript are:

Less server interaction: You can validate user input before sending the page off to the server. This saves server traffic, which means fewer loads on your server.

Immediate feedback to the visitors: They don't have to wait for a page reload to see if they have forgotten to enter something.

Increased interactivity: You can create interfaces that react when the user hovers over them with a mouse or activates them via the keyboard.

Richer interfaces: You can use JavaScript to include such items as drag-and-drop components and sliders to give a Rich Interface to your site visitors.

Limitations with JavaScript: We cannot treat JavaScript as a full-fledged programming language. It lacks the following important features:

Client-side JavaScript does not allow the reading or writing of files. This has been kept for security reason.

JavaScript cannot be used for Networking applications because there is no such support available.

JavaScript doesn't have any multithreading or multiprocessing capabilities.

JavaScript syntax A JavaScript consists of JavaScript statements that are placed within the <script>... </script> HTML tags in a web page. You can place the <script> tag containing your JavaScript anywhere within you web page but it is preferred way to keep it within the <head> tags. The <script> tag alert the browser program to begin interpreting all the text between these tags as a script. So simple syntax of your JavaScript will be as follows <script ...> JavaScript code </script> The script tag takes two important attributes:

language: This attribute specifies what scripting language you are using. Typically, its value will be javascript. Although recent versions of HTML (and XHTML, its successor) have phased out the use of this attribute.

type: This attribute is what is now recommended to indicate the scripting language in use and its value should be set to "text/javascript".

So your JavaScript segment will look like: <script language="javascript" type="text/javascript"> JavaScript code </script> Your First JavaScript Script: Let us write our class example to print out "Hello World".

<html> <body> <script language="javascript" type="text/javascript"> <!-document.write("Hello World!") //--> </script> </body> </html> We added an optional HTML comment that surrounds our Javascript code. This is to save our code from a browser that does not support Javascript. The comment ends with a "//-->". Here "//" signifies a comment in Javascript, so we add that to prevent a browser from reading the end of the HTML comment in as a piece of Javascript code. There is a flexibility given to include JavaScript code anywhere in an HTML document. But there are following most preferred ways to include JavaScript in your HTML file.

Script in <head>...</head> section. Script in <body>...</body> section. Script in <body>...</body> and <head>...</head> sections. Script in and external file and then include in <head>...</head> section.

>>>see tuto

AJAX AJAX is Asynchronous JavaScript and XML It is a web development technique for creating interactive web applications. AJAX meant to increase the web page's interactivity, speed, and usability. Technologies Used in AJAX JavaScript

Loosely typed scripting language JavaScript function is called when an event in a page occurs Glue for the whole AJAX operation

DOM

API for accessing and manipulating structured documents Represents the structure of XML and HTML documents

CSS

Allows for a clear separation of the presentation style from the content and may be changed programmatically by JavaScript

XMLHttpRequest

JavaScript object that performs asynchronous interaction with the server.

How AJAX Works

Example <!DOCTYPE html> <html> <head> <script> function loadXMLDoc() { var xmlhttp; if (window.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera, Safari xmlhttp=new XMLHttpRequest(); } else

{// code for IE6, IE5 xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.onreadystatechange=function() { if (xmlhttp.readyState==4 && xmlhttp.status==200) { document.getElementById("myDiv").innerHTML=xmlhttp.responseText; } } xmlhttp.open("GET","ajax_info.txt",true); xmlhttp.send(); } </script> </head> <body> <div id="myDiv"><h2>Let AJAX change this text</h2></div> <button type="button" onclick="loadXMLDoc()">Change Content</button> </body> </html>

Mashups

A mashup, in web development, is a web page, or web application, that uses and combines data, presentation or functionality from two or more sources to create new services. The term implies easy, fast integration, frequently using open application programming interfaces (API) and data sources to produce enriched results that were not necessarily the original reason for producing the raw source data.

The main characteristics of a mashup are combination, visualization, and aggregation. It is important to make existing data more useful, for personal and professional use. To be able to permanently access the data of other services, mashups are generally client applications or hosted online.

Mashups can be considered to have an active role in the evolution of social software and Web 2.0. Mashup . A mashup application is architecturally comprised of three different participants that are logically and physically disjoint (they are likely separated by both network and organizational boundaries): API/content providers, the mashup site, and the client's Web browser.

Module 5: Performance, Scalability and Security Load testing:

Profiling In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, space (memory) or time complexity of program, the usage of particular instructions, or frequency and duration of function calls. Profiling information is commonly used to optimize programs. Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). The methodology of the profiler itself classifies the profiler as event-based, as statistical, as instrumentation, or as simulation. Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, instruction set simulation, operating system hooks, and performance counters. The output of a profiler may be:o o o A statistical summary of the events observed (a profile). A stream of recorded events (a trace) An ongoing interaction with the hypervisor (continuous or periodic monitoring via onscreen display for instance) Profiler types based on output

o o

Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context. Call graph profilers show the call times, and frequencies of the functions, and also the callchains involved based on the callee

Based on their data granularity, on how profilers collect information, they are classified into o o Event based profilers Examples of programming languages having event-based profilers: java, .net, python , ruby Statistical profilers Some profilers operate by sampling. A sampling profiler probes the target program's program counter at regular intervals using operating system interrupts. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed. The resulting data are not exact, but a statistical approximation. Some of the most commonly used statistical profilers are AMD CodeAnalyst, Apple Inc. Shark (OSX), oprofile (Linux), Intel VTune and Parallel Amplifier. o Instrumenting profilers Some profilers instrument the target program with additional instructions to collect the required information E.g. gprof o Hypervisor/Simulator Hypervisor: Data are collected by running the (usually) unmodified program under a hypervisor. Example: SIMMON Simulator and Hypervisor: Data collected interactively and selectively by running the unmodified program under an Instruction Set Simulator. Examples: SIMON and OLIVER. [Notes: 1. In computing, a hypervisor, also called virtual machine manager (VMM), is one of many hardware virtualization techniques allowing multiple operating systems, termed guests, to run concurrently on a host computer. 2. In context of computer programming, instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and to write trace information. Programmers implement instrumentation in the form of code instructions that monitor specific components in a system (for example, instructions may output logging information to appear on screen). When an application contains instrumentation code, it can be managed using a management tool. Instrumentation is necessary to review the performance of the application. Instrumentation approaches can be of two types, source instrumentation and binary instrumentation. In programming, instrumentation means the ability of an application to incorporate:

Code tracing - receiving informative messages about the execution of an application at run time.

Debugging and (structured) exception handling - tracking down and fixing programming errors in an application under development.

Profiling (computer programming) - a means by which dynamic program behaviors can be measured during a training run with a representative input. This is useful for properties of a program which cannot be analyzed statically with sufficient precision, such as alias analysis.

Performance counters - components that allow the tracking of the performance of the application.

Computer data logging - components that allow the logging and tracking of major events in the execution of the application.]

Tools:
siege Siege is an http regression testing and benchmarking utility. It was designed to let web developers measure the performance of their code under duress (compulsory force or threat), to see how it will stand up to load on the internet. Siege supports basic authentication, cookies, HTTP and HTTPS protocols. It lets the user hit a web server with a configurable number of concurrent simulated users. Those users place the webserver under siege. The duration of the siege is measured in transactions, the sum of simulated users and the number of times each simulated user repeats the process of hitting the server. o Thus 20 concurrent users 50 times is 1000 transactions, the length of the test.

Performance measures include elapsed time of the test, the amount of data transferred ( including headers ), the response time of the server, its transaction rate, its throughput, its concurrency and the number of times it returned OK. These measures are quantified and reported at the end of each run.

Siege has essentially three modes of operation, regression, internet simulation and brute force. It can read a large number of URLs from a configuration file and run through them incrementally ( regression ) or randomly ( internet simulation ). Or the user may simply pound a single URL with a runtime configuration at the command line ( brute force ).

The format for invoking siege is: siege options siege supports the following command line options:

-V version Print version information to the screen.

-h help Print the help section. This presents a summary of the options discussed in this section of the manual.

-C config Print the current configuration. This option reads your .siegerc file and prints the settings.

The siege configuration file is called .seigerc and it is located in the home directory of the user who installed siege. Siege understands the following URL format: [protocol://] [servername.domain.xxx] [:portnumber] [/directory/file] Currently, siege only supports http and https protocols.

http://www.joedog.org/siege-manual/

>>>>web stress testing tool,

httperf
httperf is a tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of httperf is not on implementing one particular benchmark but on providing a robust, high-performance tool that facilitates the construction of both micro- and macrolevel benchmarks. The three distinguishing characteristics of httperf are its o o o h help hog Prints a summary of available options and their parameters. This option requests to use up as many TCP ports as necessary. Without this option, robustness, which includes the ability to generate and sustain server overload support for the HTTP/1.1 and SSL protocols, and extensibility to new workload generators and performance measurements.

The operation of httperf can be controlled through a number of options

httperf is typically limited to using ephemeral ports (in the range from 1024 to 5000). This limited port range can quickly become a bottleneck so it is generally a good idea to specify this option for serious

testing. Also, this option must be specified when measuring NT servers since it avoids a TCP incompatibility between NT and UNIX machines. httpversion=S Specifies the version string that should be included in the requests sent to the server. By default, version string 1.1 is used. This option can be set to 1.0 to force the generation of HTTP/1.0 requests. Setting this option to any value other than 1.0 or 1.1 may result in undefined behavior.

EXAMPLES httperf hog server www This command causes httperf to create a connection to host www, send a request for the root document (http://www/), receive the reply, close the connection, and then print some performance statistics. httperf hog server www numconn 100 ra 10 timeout 5 Like above, except that a total of 100 connections are created and that connections are created at a fixed rate of 10 per second. Note that option rate has been abbreviated to ra. httperf hog server=www wsess=10,5,2 rate 1 timeout 5 Causes httperf to generate a total of 10 sessions at a rate of 1 session per second. Each session consists of 5 calls that are spaced out by 2 seconds. httperf hog server=www wsess=10,5,2 rate=1 timeout=5 ssl Like above, except that httperf contacts server www via SSL at port 443 (the default port for SSL connections).

>>>>Performance tuning and Scalability,

Content Caching
A content hashing (web cache) is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met. Google's cache link in its search results provides a way of retrieving information from websites that have recently gone down and a way of retrieving data more quickly than by clicking the direct link.

Web caches can be used in various systems.


A search engine may cache a website. A forward cache is a cache outside the webserver's network, e.g. on the client software's ISP or company network.

A network-aware forward cache is just like a forward cache but only caches heavily accessed items.

A reverse cache sits in front of one or more Web servers and web applications, accelerating requests from the Internet.

A client, such as a web browser, can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server.

A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.

A content delivery network can retain copies of web content at various points throughout a network.

Cache control HTTP defines three basic mechanisms for controlling caches: freshness, validation, and invalidation.

Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for.

Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The ETag (entity tag) mechanism also allows for both strong and weak validation.

Invalidation is usually a side effect of another request that passes through the cache. For example, if a URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated.

Browser cache Web browsers cache content on the client machine, in memory and on disk.

>>>>Client page-load performance tuning

Replication
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, faulttolerance, or accessibility. Replication can be o o data replication if the same data is stored on multiple storage devices computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible. Active and passive replication in systems replicate data or services:
o o

active replication is performed by processing the same request at every replica. passive replication involves processing each single request on a single replica and then transferring its resultant state to the other replicas.

Backup differs from replication in that it saves a copy of data unchanged for a long period of time. Replicas, on the other hand, undergo frequent updates and quickly lose any historical state.

Replication models in distributed systems o o o Transactional replication State machine replication Virtual sunchrony

Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates. Multi-master replication, where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most

synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently on all nodes. Storage Replication: Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disks. This way, any file system supported by the operating system can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller) or in software (in a device driver). The most basic method is disk mirroring, typical for locally-connected disks.

Load balancing
Load balancing is a computer networking methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by dedicated software or hardware, such as a multilayer switch or a Domain Name System server. One of the most common applications of load balancing is to provide a single Internet service from multiple servers, sometimes known as a server farm. Commonly, load-balanced systems include popular web sites, large Internet Relay Chat networks, high-bandwidth File Transfer Protocol sites, Network News Transfer Protocol (NNTP) servers and Domain Name System (DNS) servers and databases (database load balancers). For Internet services, the load balancer is usually a software program that is listening on the port where external clients connect to access services. The load balancer forwards requests to one of the "backend" servers, which usually replies to the load balancer. This allows the load balancer to reply to the client without the client ever knowing about the internal separation of

functions. It also prevents clients from contacting backend servers directly, which may have security benefits by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated services running on other ports. An alternate method of load balancing, which does not necessarily require a dedicated software or hardware node, is called round robin DNS. In this technique, multiple IP addresses are associated with a single domain name; clients are expected to choose which server to connect to. Unlike the use of a dedicated load balancer, this technique exposes to clients the existence of multiple backend servers. The technique has other advantages and disadvantages, depending on the degree of control over the DNS server and the granularity of load balancing desired. Load balancer features o o o o o o o o o o o o Asymmetric load Priority activation Priority queuing HTTP compression HTTP caching HTTP security TCP offload TCP buffering SSL Offload and Acceleration Content filtering Firewall Intrusion prevention system

>>>>Protocols:

Password Hashing
Hash algorithms are one way functions. They turn any amount of data into a fixed-length "fingerprint" that cannot be reversed. They also have the property that if the input changes by even a tiny bit, the resulting hash is completely different. This is great for protecting passwords, because we want to store passwords in an encrypted form that's impossible to decrypt, but at the same time, we need to be able to verify that a user's password is correct.

The general workflow for account registration and authentication in a hash-based account system is as follows: 1. The user creates an account. 2. Their password is hashed and stored in the database. At no point is the plain-text (unencrypted) password ever written to the hard drive.

3. When the user attempts to login, the hash of the password they entered is checked against the hash of their real password (retrieved from the database). 4. If the hashes match, the user is granted access. If not, the user is told they entered invalid login credentials. Steps 3 and 4 repeat everytime someone tries to login to their account.

Symmetric and asymmetric keys (PKI)


Cryptography is an art, as well as a science, that involves the process of transforming plaintext into scrambled text(cipher text) and vice-versa. The purpose of cryptography is to conceal the confidential information from unauthorized eyes and ensure immediate detection of any alteration made to the concealed information. Public Key Infrastructure (PKI) is a framework that enables integration of various services that are related to cryptography. The aim of PKI is to provide confidentiality, integrity, access control, authentication, and most importantly, non-repudiation. [Non-repudiation is a concept, or a way, to ensure that the sender or receiver of a message cannot deny either sending or receiving such a message in future. One of the important audit checks for non-repudiation is a time stamp. The time stamp is an audit trail that provides information of the time the message is sent by the sender and the time the message is received by the receiver] Encryption and decryption, digital signature, and key exchange are the three primary functions of a PKI. RSS and elliptic curve algorithms provide all of the three primary functions: encryption and decryption, digital signatures, and key exchanges. Diffie-Hellmen algorithm supports key exchanges, while Digital Signature Standard (DSS) is used in digital signature. In PKI, every user will have two keys known as "pair of keys". One key is known as a private key and the other is known as a public key. The private key is never revealed and is kept with the owner, and the public key is accessible by everyone and is stored in a key repository. A key can be used to encrypt as well as to decrypt a message. Most importantly, a message that is encrypted with a private key can only be decrypted with a corresponding public key. Similarly, a message that is encrypted with a public key can only be decrypted with the corresponding private key. Secure messaging To ensure that the document is protected from eavesdropping and not altered during the transmission, Bob will first encrypt the document using Alice's public key. This ensures two things: one, that the document is encrypted, and two, only Alice can open it as the document

requires the private key of Alice to open it. To summarize, encryption is accomplished using the public key of the receiver and the receiver decrypts with his or her private key. In this method, Bob could ensure that the document is encrypted and only the intended receiver (Alice) can open it. However, Bob cannot ensure whether the contents are altered (Integrity) during transmission by document encryption alone. o Message digest In order to ensure that the document is not altered during transmission, Bob performs a hash function on the document. The hash value is a computational value based on the contents of the document. This hash value is known as the message digest. By performing the same hash function on the decrypted document the message, the digest can be obtained by Alice and she can compare it with the one sent by Bob to ensure that the contents are not altered. o o This process will ensure the integrity requirement Digital signature In order to prove that the document is sent by Bob to Alice, Bob needs to use a digital signature. Using a digital signature means applying the sender's private key to the message, or document, or to the message digest. This process is known as signing. Only by using the sender's public key can the message be decrypted. o Bob will encrypt the message digest with his private key to create a digital signature. In the scenario illustrated in the image above, Bob will encrypt the document using Alice's public key and sign it using his digital signature. This ensures that Alice can verify that the document is sent by Bob, by verifying the digital signature (Bob's private key) using Bob's public key. Remember a private key and the corresponding public key are linked, albeit mathematically. Alice can also verify that the document is not altered by validating the message digest, and also can open the encrypted document using her private key. o Message authentication is an authenticity verification procedure that facilitates the verification of the integrity of the message as well as the authenticity of the source from which the message is received. Digital certificate By digitally signing the document, Bob has assured that the document is sent by him to Alice. However, he has not yet proved that he is Bob. To prove this, Bob needs to use a digital certificate. A digital certificate is an electronic identity issued to a person, system, or an organization by a competent authority after verifying the credentials of the entity. A digital certificate is a public key that is unique for each entity. A certification authority issues digital certificates. In PKI, digital certificates are used for authenticity verification of an entity. An entity can be an individual, system, or an organization.

An organization that is involved in issuing, distributing, and revoking digital certificates is known as a Certification Authority (CA). A CA acts as a notary by verifying an entity's identity. One of the important PKI standards pertaining to digital certificates is X.509. It is a standard published by the International Telecommunication Union (ITU) that specifies the standard format for digital certificates.

PKI also provides key exchange functionality that facilitates the secure exchange of public keys such that the authenticity of the parties can be verified

>>>>: (SQL injection,

Security threats

Invalid inputs

Client side validation is not really validation. It isnt very difficult for an attacker to disable script execution on her workstation, enter malicious invalid form input, and then submit the form. If theres no validation on the server side of the transaction, server crashes and the execution of rogue commands are just two of the possible outcomes.

Buffer overflows
a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety.

Buffer overflows can be triggered by inputs that are designed to execute code, or alter the way the program operates. This may result in erratic program behaviour, including memory access errors, incorrect results, a crash, or a breach of system security. Thus, they are the basis of many software vulnerabilities and can be maliciously exploited. Bounds checking can prevent buffer overflows. The most reliable way to avoid or prevent buffer overflows is to use automatic protection at the language level. Protective counter measures o o o o o o o Choice of programming language Use of safe libraries Buffer overflow protection Pointer protection Executable space protection Address space layout randomization Deep packet inspection

>>>>cross-site scripting, thread safety, hidden fields), How to build secure applications

//////////////////////////////////// Types of Website Content - Static and Dynamic Static Web Site A static web page (sometimes called a flat page) is a web page that is delivered to the user exactly as stored, in contrast to dynamic web pages which are generated by a web application.

Consequently a static web page displays the same information for all users, from all contexts, subject to modern capabilities of a web server to negotiate content-type or language of the document where such versions are available and the server is configured to do so. Static web pages are often HTML documents stored as files in the file system and made available by the web server over HTTP. However, loose interpretations of the term could include web pages stored in a database, and could even include pages formatted using a template and served through an application server, as long as the page served is unchanging and presented essentially as stored. Advantages and disadvantages Advantages o o o o No programming skills are required to create a static page. Inherently publicly cacheable (i.e. a cached copy can be shown to anyone). No particular hosting requirements are necessary. Can be viewed directly by a web browser without needing a web server or application server, for example directly from a CDROM or USB Drive. Disadvantages o o Any personalization or interactivity has to run client-side (ie. In the browser), which is restricting. Maintaining large numbers of static pages as files can be impractical without automated tools.

Application areas of Static Website: Need of Static web pages arise in the following cases. o o o o o o Changes to web content is infrequent List of products / services offered is limited Simple e-mail based ordering system should suffice No advanced online ordering facility is required Features like order tracking, verifying availability of stock, online credit card transactions, are not needed Web site not required to be connected to back-end system.

Static Web pages are very simple in layout and informative in context. Creation of static website content requires great level of technical expertise and if a site owner is intended to create static web pages, they must be very clear with their ideas of creating such pages since they need to hire a web designer. Dynamic Web Sites A dynamic web page is a kind of web page that has been prepared with fresh information (content and/or layout), for each individual viewing. It is not static because it changes with the time (ex. anews content), the user (ex. preferences in a login session), the user interaction (ex. web page game), the context (parametric customization), or any combination of the foregoing.

Two types of dynamic web sites Client-side scripting and content creation Using client-side scripting to change interface behaviors within a specific web page, in response to mouse or keyboard actions or at specified timing events. In this case the dynamic behavior occurs within the presentation. Such web pages use presentation technology called rich interfaced pages. Client-side scripting languages like JavaScript or ActionScript, used for Dynamic HTML (DHTML) and Flash technologies respectively, are frequently used to orchestrate media types (sound, animations, changing text, etc.) of the presentation. The scripting also allows use of remote scripting, a technique by which the DHTML page requests additional information from a server, using a hidden Frame, XMLHttpRequests, or a Web service. The Client-side content is generated on the user's computer. The web browser retrieves a page from the server, then processes the code embedded in the page (often written in JavaScript) and displays the retrieved page's content to the user. The innerHTML property (or write command) can illustrate the client-side dynamic page generation: two distinct pages, A and B, can be regenerated as document. innerHTML = A anddocument. innerHTML = B; or "on load dynamic" by document.write(A) and document.write(B). Server-side scripting and content creation Using server-side scripting to change the supplied page source between pages, adjusting the sequence or reload of the web pages or web content supplied to the browser. Server responses may be determined by such conditions as data in a posted HTML form, parameters in the URL, the type of browser being used, the passage of time, or a database or server state. Such web pages are often created with the help of server side languages such as PHP, Perl, ASP, ASP.NET, JSP, ColdFusion and other languages. These server-side languages typically use the Common Gateway Interface (CGI) to produce dynamic web pages. These kinds of pages can also use, on the client-side, the first kind (DHTML, etc.). Serverside dynamic content is more complicated: (1) The client sends the server the request. (2) The server receives the request and processes the server-side script such as [PHP] based on the query string, HTTP POST data, cookies, etc. The dynamic page generation was made possible by the Common Gateway Interface, stable in 1993. Then Server Side Includes pointed a more direct way to deal with server-side scripts, at the web servers. Combining client and server side Ajax is a web development technique for dynamically interchanging content with the server-side, without reloading the web page. Google Maps is an example of a web application that uses Ajax techniques and database. Application areas of Dynamic Website Dynamic web page is required when following necessities arise: o Need to change main pages more frequently to encourage clients to return to site.

o o o o o

Long list of products / services offered that are also subject to up gradation Introducing sales promotion schemes from time to time Need for more sophisticated ordering system with a wide variety of functions Tracking and offering personalized services to clients. Facility to connect Web site to the existing back-end system

The fundamental difference between a static Website and a dynamic Website is a static website is no more than an information sheet spelling out the products and services while a dynamic website has wider functions like engaging and gradually leading the client to online ordering. But both static web site design and dynamic websites design can be designed for search engine optimization. If the purpose is only to furnish information, then a static website should suffice. Dynamic website is absolutely necessary for e-commerce and online ordering

Вам также может понравиться