Академический Документы
Профессиональный Документы
Культура Документы
Darrel Ince
Open University
Internet Technologies
Lecture 1
Introduction (i)
Internet Technologies
Aims
Internet Technologies
Internet Technologies
The context
Only infrequently
will I look at
network functions
Application
layer
Network
functions
Internet Technologies
You can view a distributed system as a two layer onion (in truth they often have
many more layers). The innermost layer contains system software which
implements functions which are concerned with the management and continuing
functioning of the system, for example software which carries out the transfer of
data from one computer to another.
Within this course I shall mainly ignore the internal layer; the only exception
being when I discuss the design of distributed applications where a knowledge of
this layer is necessary to carry out competent design.
The course concentrates on the outside layer: the design, programming and
technologies required to implement a distributed system; for example the
technologies used to program Web servers.
Network functions
Resource management
Naming
Security will be examined
Security
in more detail later
Transmission
Failure processing
Internet Technologies
Internet Technologies
The first part of the course acts as an introduction to the Internet and its
technologies. In particular it looks at how these technologies are used to support
applications.
The second part of the course looks at the various architectures that are employed
in distributed systems. These range from architectures which lie close to the
network (message passing) to those which abstract away from network details
(Tuple architectures).
The course then concentrates on one particular distributed paradigm: distributed
objects and uses the Java-based technology of RMI and CORBA as exemplars.
The course also contains material on client/server technology and, for example,
discusses the range of servers that are available and provides a motivation for
using client/server technology
After looking at clients and servers the course homes in on Web servers. These
are probably the most important type of server in the Internet. I shall be looking
at how they function, the HTTP protocol which lies under the bonnet of a Web
server and how such servers are programmed. I shall also deal with the emerging
topic of Web services: the functionality provided by Web servers in terms of its
component-based implementation. As part of the web services part of the course I
will look at the development of the web into Web 2.0.
The course also looks at XML. This is a technology which is used to define
markup languages and attempts to overcome a major problem in the Internet: that
of differing formats and standards for data
The next two topics are related. I shall look at how systems can be regarded as
consisting of a series of concurrent agents interacting with each other. In a
distributed environment this interaction occurs using communication technology.
This discussion is then used to motivate a lecture on distributed application
design, focusing on performance
A component of the course is security; here I shall look at the wide variety of
threats which a network can come under and the technologies that can be used to
minimise these threats.
Internet history
ARPANET
NSFNET -> TCP/IP
Merging of ARPANET and NSFNET
ANS take over the joint network
The commercialisation of the net and the rise of
the ISP
The birth of the World Wide Web
Internet link: history of the Internet
Internet Technologies
In the 1960s the American government realised that their command and control
systems were vulnerable to attack by nuclear weapons. From this came a project
which looked at connecting a number of defence computers together using packet
switching technology.
The network that was formed from this was known as ARPANET. It used
primitive protocols and, as a consequence of problems with these protocols, a
more sophisticated protocol known as TCP/IP was developed. By 1983 there
were hundreds of computers connected to the ARPANET. In the eighties a
university network known as NSFNET was developed which was then merged
with the ARPANET to form the early Internet.
Commercial involvement in the Internet came when a company known as ANS
took over the combined network and sold access to companies known as Internet
service providers (ISPs).
Parallel to this development was that of the World Wide Web. Initially this was
intended as an internal document-dispensing system at CERN in Switzerland;
however, it was designed in such a way that it was easily ported to the Internet.
The rest is history!!
A major principle
The application developer should
be hidden as much as possible
from network details
For example, the database
programmer should not
be worried where system
databases are held
Internet Technologies
10
A major principle that runs through this course is that when developing a
distributed system the developers should not worry too much about the physical
details of the network that it is based on, for example, the programming of a
database should be the same, irrespective of where it resides on a network. The
ideal technology is one which hides all the details!
10
Internet facilities
Bulletin boards
FTP
Email
The World Wide Web
Newsgroups
Mailing lists
Slide show: Introduction to the Internet
Internet Technologies
11
In this course I will be concerned with the facilities of the Internet and how they
are used to develop distributed applications. Some of these facilities are shown
above
FTP is used to dispense files, for example software which a customer has
bought.
Email is used as a marketing technology, for example to inform customers of
new offers.
The Web is used as a storefront or as a market for services and products.
Newsgroups and Bulletin boards are used to keep customers in touch with each
other.
Mailing lists are used to dispense information.
11
Internet Technologies
12
The slide above shows a typical supply chain. The particular chain is used to
produce mouthwash. At every stage in the chain, network technology can be
used, ranging from the use of a portable computer by a farmer which informs a
wholesaler that maze is available for collection, to the use of a Web client by a
chemist which orders mouthwash from a chemical products wholesaler.
This chain is totally enabled by network technology.
12
13
Model
Technology development
Firm infrastructure
Inbound logistics
Operations
Outbound logistics
Internet Technologies
Service
Procurement
14
Categories
Inbound logistics, all activities involved with receiving,
storing and disseminating inputs to the product or service
Operations, all activities involved in transforming the
inputs
Outbound logistics, all activities involving the distribution
of products and services
Marketing and sales, activities which provide
opportunities for the customer to buy a service of a product
Servicing, all activities associated with servicing a product
or service.
Internet Technologies
15
16
Business models
A business model abstracts away from details
17
17
E-commerce
18
The course is about the use of Internet technology to support business. The most
visible manifestation of this has been e-commerce: the use of Internet technology
(primarily Web technology) to support conventional retailing, often known as
etailing.
In the late nineties it saw a huge de-emphasis. However the conditions that have
lead to the decline of the dot.com boom has not affected the area known as ebusiness. Such an area is still booming and indeed e-commerce has made a strong
recovery
This course concentrates on both e-commerce and e-business applications such as
the supply chain example presented on the previous page.
18
Stock management
Customer payment management
Ware display
Supplier payment management
Delivery
Market analysis
Internet Technologies
19
One of the major features of many Internet application is that if you write down a
systems functions (at least in outline) then you would have major difficulties in
discovering whether the system is, in fact, a networked one. There are a number
of ramifications of this:
For the most part conventional software engineering methods can be used.
The only addition would be design for reliability and testing at the client/server
level.
That you can reuse functions from conventional requirements documents.
The slide above details some system functions, notice that the word Internet
does not appear!
19
An observation
Internet Technologies
20
20
Question
If I had presented the previous slide without
telling you this was an Internet course would
you have guessed the title of the course?
Internet Technologies
21
The answer is no. The important point is that in many ways a networked
application does not differ from a conventional DP or MIS application. The only
differences arise from the fact that you have computers connected by
communication technologies. This gives rise to issues about system performance,
system reliability and system level testing at the client/server level. The test of a
good technology is: does it hide the underlying network?
21
E-Business models
Internet Technologies
22
The next slides look at the variety of models which have been used to describe e
applications. One of the distinguishing features of the e-business/e-commerce
area is that most people assume that the only model that is available is the
conventional etailing model as exemplified by Amazon. While such a model is
probably the most advanced and functionally static, it is not the only one. The
aim of the next stage of this lecture is to look at a number of other models.
The best work describing electronic commerce business models is Electronic
Commerce, Paul Timmers, John Wiley. This is an excellent description of the
state of the electronic marketplace in 2000.
22
E-shops
Conventional
model
23
23
E-procurement
Electronic tendering and procurement of goods
Benefits include access to a greater number of
tenderers, reduced cost of procurement
Benefits to suppliers include more tendering
opportunities and cost reductions in submitting a
tender
Some links on e-procurement
Internet Technologies
24
This model involves a company transferring its procurement process to the Web.
A typical Web site that implements this business model would provide: a
complete list of products and services to be tendered, downloads of tender forms,
dates for tenders, interaction facilities which enable suppliers to submit and track
their tender and some form of private interaction which enables queries about a
particular tender to be resolved.
The main advantage to both the company that is seeking tenders and the tenderers
is reduced cost within the tendering process. It also has the advantage that it
brings in more potential suppliers, thus driving down the cost of individual
contracts.
Electronic tendering has been a major growth area within the e-business area over
the next five years.
24
E-malls
Collection of companies selling services and
products
Can be e-commerce or e-business in concept
They often have the same interface
Many e-malls are thematic
Growth in industry marketplace e-malls
Some failures
Internet Technologies
25
An e-mall is a collection of retailers usually hosted on the same Web site and
administered by a third-party company. E-malls can be based around a particular
market segment such as fishing or could be based around a particular product
such as a highly popular word processing package.
E-malls can be retail in concept or can be based around a closed business
community.
The benefits to the e-mall operator arise from renting or membership revenues,
sale of Web-based services or sale of systems technology.
The benefits to the individual businesses which make up an e-mall include ease
of use through a common HCI, association with larger branded names and easier
access due to the conglomeration effect.
The benefits to the customer include being able to access common sectors
without carrying out a large amount of surfing and reduced prices because of a
close competition effect
25
E-auctions
An area where
profits have been
very healthy
26
26
E-learning
Follows the bust and boom of e-commerce
Transformed by Web 2.0
Supported by a number of virtual learning
environments.
Major investment all levels of education
Not just putting your lecture notes on the
web
Internet Technologies
27
After some disasters such as the United Kingdom E-learning University this area
has started to expand again. The availability of blogs, wikis, conferencing and
allied web 2.0 technologies has meant a resurgence in this area supported by
support software such as virtual learning environments. A typical environment is
the open-source system Moodle.
27
Virtual communities
The creation of communities of buyers/users
Often an adjunct to another business model such
as e-shop
Uses discussion forums, FAQs bulletin boards,
closed user groups
An introduction to virtual communities
Controversial area
Internet Technologies
28
28
Internet Technologies
29
This is a model employed by companies who wish to leave marketing and other
functions to a third party. This model is similar to the electronic mall concept.
29
30
30
Information brokerage
Access to information
Increasingly subscription based
Can be based on a per-transaction charge
Major sub-area is financial information
Internet Technologies
31
31
Trust brokerage
32
32
Internet Technologies
33
The dynamic pricing model is one which has a number of different instantiations.
Basically, such models treat the price of a product or service (primarily a
product) as variable and open to negotiation.
The name-your-price instantiation of this model is where the customer of a site
offers the price that he or she thinks is reasonable for a product or service. The
administrator of the Web site will then pass on this bid to the provider of the
product or service who will either decide to accept it.
The comparison pricing sub-model encompasses Web sites which provide an
interface to e-shops that sell some specific product. They provide the facility for
the customer to interrogate a database of product catalogues to look for the
cheapest price for a particular product such as book or a CD.
The demand sensitive pricing sub-model is based on the fact that suppliers of a
product will lower the price of a product if a number of units of that product are
included in a single sale. Web sites which employ this model provide facilities
whereby consumers can notify each other of their interest in buying a particular
product such as a freezer. The site keeps a database of current products that have
attracted a number of buyers with a predicted price and allow users to join the
database of buyers who are committed to a sale.
The bartering sub-model allows consumers to barter services or products for
other services or products. A site devoted to this form of economic activity will
keep a structured database of items for sale and allows a buyer to barter with a
seller.
33
B2B exchanges
Collection of Web sites
Enable business to business transactions
such as procurement to be carried out
efficiently
Enable businesses to buy products from
each other, form temporary alliances etc.
An introduction to B2B exchanges
Internet Technologies
34
A B2B exchange is a Web site or collection of Web sites which make the process
of carrying out business-to-business transactions much easier. Under this banner
comes sites which carry enable multiple companies to: procure services and
products from each other; to help businesses form temporary alliances to carry
out activities such as joint marketing or project bidding and enable a marketplace
in raw materials to function.
34
35
35
SalesForce.com
Customer relationship management ASP
Functionality includes marketing analytics
and partner relationship management
Currently over 35000 customers
Also includes a platform API
Internet Technologies
36
This company is, without a doubt the best known ASP. It provides a service to
companies that want to access customer relationship functions without doing any
development. It is currently one of the fastest growing CRM companies in the
states.
36
37
It might seem paradoxical to include sites which provide free products or services
under the category of business models. Typical sites which come under this
category include gaming sites where users can play computer games using their
browser, sites which run free raffles and sites which offer free software.
Such sites do not earn any revues from the products or services they offer;
revenue is earned indirectly, for example by means of banner adverts or by
receiving revenue from sites which you have to visit before experiencing a
service or buying a product.
One of the largest free product areas is that of free software. Organisations in this
area include those who raise revenues and those who do not. An example of a
company in the former category is Red Hat. This is a company that provides free
versions of the Linux operating system. You can download Linux from the Red
Hat Web site and install it on your computer without paying a cent to the
company. Red Hat raise their revenues through support, packaging distributions
onto CDs and providing services to companies who employ Linux for application
development. Companies such as Red Hat are the analogue of those companies
who sell a razor for little or no cost but make their profit from selling the razor
blades.
There are a number of sites in the Internet which do not make any money from
issuing software. These are sites associated with open source development. They
are purely altruistic.
37
Availability
Ubiquity
Global reach
Digitization
Variety of multimedia
Interactivity
Network effects
Integration
Internet Technologies
38
38
The Internet
A network of networks
Dangers and
advantages
Still growing quickly
Open system
Relies on a number of standard protocols
Large part of the net is the World Wide
Web
Internet Technologies
39
The Internet is not a network of computers but can be more accurately described
as a network which consist of sub-networks. It is still growing at a large rate
requiring the development of new versions of old protocols to cope with the
increased number of hosts.
A major feature of the Internet is that it is an open system: all the specifications
of the protocols are publicly available. The positive side to this is that anyone can
write Internet software, the negative side is that it enables malicious acts to be
carried out more easily.
The Internet has a major component known as the World Wide Web which is the
main platform for commerce.
39
Internet protocols(i)
TCP
IP
Most developers
do not need to know
about the details of these
UDP
ICMP
Internet protocols link
Internet Technologies
40
40
Telnet
FTP
SMTP
Kerberos
Domain name system
Internet Technologies
41
41
SNMP
NFS
TFTP
HTTP
All public!
Internet Technologies
42
42
43
43
An example
open.ac.uk
Internet Technologies
44
The Internet has a form of hierarchic naming which becomes more specific as
you get to the left-hand side of a name. Each time you move to the right in a
name it references more and more collections of computers until you get to
somewhere near the top of the naming tree, for example com or uk.
44
A name
Collection of computers
at the Open University
Collection of computers
at academic institutions
www.open.ac.uk
uk does not
mean that the site
is in the United
Kingdom
Collection of
Name of the computer ( a Web server) computers in the UK
Internet Technologies
45
The leftmost name in the slide represents a host name, the next name represents
all those computers that are found at the Open University, the designation ac
applies to academic institutions and the designation uk refers to United Kingdom
registered hosts.
Note that uk does not mean that the host is physically situated in the United
Kingdom, although normally this is the case.
45
Case Study
46
46
Internet Technologies
47
47
Media rich
Employs a huge variety of technologies including XML, RSS, HTML,
JavaScript, Cocoon and a variety of web and database servers
A large amount of dynamic content, some real-time
Sound and video downloads
Forums and conferencing
Email updates
SMS updates
Dynamic page generation technologies
Internet Technologies
48
The BBC site is one of the most famous sites on the Internet and has won a large
number of awards. It employs a huge variety of technologies, some are:
XML is used for the storage of base text
RSS is used for generating news feeds
HTML is used for the development of web pages
Javascript is used for client processing, for example forms processing
Cocoon is used for document processing
A variety of web and database servers are used for dispensing web pages and
storing data.
48
49
49
Legacy technology
Security and privacy
Programming and abstraction
Speed of development
Structure and data
Problems with transactions
Internet Technologies
Interrelated
Major problem
50
There are a number of problems with the Internet which this course will look at;
in particular it will show how these problems have been overcome by the use of
technology. The problems are
Legacy technology. The fact that the base technology used in the Internet is
inadequate for the uses to which it is being put.
Security. The fact that since the Internet is an open system its specifications are
readily available and intruders can make use of this to carry out malicious acts
Programming and abstraction. The programming models used in the past for
network development are inadequate given the speed of development of the net.
Structure and Data. The Internet contains large amounts of data ranging from
Web pages to structured databases; often such data is in widely differing formats.
Problems with transactions. A transaction can span over a considerable time and
over a number of hosts situated in separate countries. This gives rise to
considerable synchronisation problems.
50
51
51
Security problems
The Internet is an open system
Users are allowed access from anywhere to,
say, a Web server
The higher availability of access together
with the open system aspects leaves the
Internet open to abuse.
Security is dealt with later in detail
Internet Technologies
52
Because the specifications of Internet protocols are publicly available this means
that intruders can read them, find weaknesses and then exploit them. This leads to
a greater security problem. The obverse of this is that since there are many more
users, solutions to these problems are often more forthcoming than if the network
was based on proprietary protocols. This is one of the arguments put forward by
the open source movement.
Also, since anyone can access a host to some depth, for example by browsing a
Web page, an intruder will have already gone some way into a system without
deploying very much effort.
52
53
Networked applications have, in the past, been designed and programmed using
the idea that communicating hosts synchronise and communicate using messages.
While this is quite a good model for small applications where speed is of the
utmost it is severely limited.
Because of this a number of new models have been developed. One of the most
popular ones is the use of distributed objects: objects which lie on physically
separate computers but which can be accessed as if they lived on the same
computer. Two popular distributed object schemes are DCOM and CORBA.
Another idea is that of a web service, again this will be dealt with later.
Another paradigm is that of a tuple system which regards a network of computers
as just a very large store of data.
53
54
The Internet contains data of widely differing structure and format. A good
example of this is Web pages which are written in a wide variety of versions of
the Web language HTML. As well as being in different formats the pages do not
have any semantic information which gives a clue to their content, for example in
a book site a number might be the price of a book or its discount.
Because of this a meta-language known as XML has been developed. This metalanguage allows data to be tagged with semantic markers. This data can then be
processed by programs which are XML knowledgeable in order to extract user
information.
XML is an important technology which has the same flavour as HTML and
SGML (the source of inspiration of HTML)
A major problem that has emerged as the Internet has become bigger is that of
searching a large body of unstructured text
54
Speed of development
Internet applications often require high
speed development
The use of design patterns
High level APIs
Fast development methods such as agile
methods
Internet Technologies
55
Because of the speed of use of the Internet there is often a requirement on firms
to develop software quickly. This has given rise to a number of advances:
Design patterns are micro architectures which can be used time and time again.
High level APIs such as the Java collection API which reduce the amount of
detailed coding.
Fast development methods such as rapid prototyping and evolutionary
prototyping.
55
56
56
57
The lecture concludes with a brief description of open source software and its
relevance to the Internet. Open source software is software that has been
developed by designers and programmers and offered as is, for free. It includes
some very popular products such as the Apache web server. The relevance to the
Internet is that for the most popular products their existence provides a de-facto
standard, this is important for systems integration purposes as you will see later
in this module. Another important element related to the Internet is that many of
the products are oriented towards Internet systems development. Finally another
important aspect is that the distribution medium is the Internet; without this it
would be highly unlikely that open source software would have advanced to the
position it occupies today.
57
mod_perl
Lucene
Tomcat
Tapestry
Cocoon
XML-graphics
Internet Technologies
58
Here are some examples of some open source software maintained by the Apache
Software Foundation:
mod_perl is a technology that allows Perl programs to be interfaced with a web
server.
Lucene is an open source search engine.
Tomcat is a Java Server Pages based web server.
Tapestry is a framework for developing open source web applications.
Cocoon is a publishing system.
XML-graphics is a project which has a number of strands centred around the use
of XML for graphic formats such as the SVG format for vector graphics.
58
Lecture 2
Introduction (ii)
Internet Technologies
59
59
Aims
To briefly describe the main components of the
Internet
To look at the concept of an open architecture
To describe the low-level mechanism of message
passing
To describe a low-level programming model of
distributed communication
Internet Technologies
60
60
Open systems
61
61
Internet Technologies
62
62
A bus network
Internet Technologies
63
63
Ring network
Internet Technologies
64
64
Hub network
Internet Technologies
65
A hub network used a main cable like a bus network, the cable is known as a
backplane. From this backplane connections lead to ports into which devices can
be plugged. Hub networks have proved to be very popular: they are easy to set up
and are cheap.
65
Layered networks
A form of architectural description
Relies on layers
Each layer call on services provided by the
next layer down.
Innermost layers are closest to the base
facilities of a computer or network
Internet Technologies
66
66
Internet Technologies
67
The outermost level in most layered architectures is that which allows the
programmer to call on application functions. This is the level that, ideally,
Internet application developers should be working at.
67
Link to OSI
Internet Technologies
68
The OSI reference model is a seven layered model. It ranges from the topmost
layer which is the level applications communicate with to the bottom-most layer
which lies very close to the hardware used to implement the model.
A number of the levels have been coalesced into single layers. This is shown on
the next slide.
68
Internet Technologies
69
The diagram shows the relationship between the OSI reference model and the
Internet layered model, with the various protocols and layers mapped across. The
block on the right shows the various services and protocols used in the Internetnot all are shown.
69
Telnet
File Transfer Protocol (FTP)
Simple Mail Transfer Protocol (SMTP)
Kerberos
Domain name system (DNS)
Internet Technologies
70
70
71
71
72
IP has the basic function of moving data created by TCP or UDP across a
network.
ICMP has the function of checking the status of computers attached to a network
HTTP which builds on many of the other protocols is the protocol used when a
Web client communicates with a Web server. It is used to characterise a request
from a client and to return status information from a Web server. More on this
protocol later
72
Distributed system
A system which consists of a number of
computers (hosts) which are connected to
each other by some transmission media
Rationale: reliability, efficiency
Consists of computers acting as clients and
as servers
Difficult to design
Internet Technologies
73
Now that computers have dropped drastically in price it has been found
convenient to connect a number together in a network. Such a collection of
computers is known as a distributed system. There are a number of reasons for
doing this: first you can get increased reliability by designing duplication into the
system, for example via replicated databases; second, you can increase efficiency
by ensuring that processing power and data lie close to the user.
The price to pay for this is complexity: for example, designing a distributed
system for performance is quite a difficult task.
A distributed system will consist of a number of computers which offer some
service (servers) and a number of computers (clients) which call on this service.
73
Why client/server?
Openness
Scalability
Specialisation
Reliability
Design flexibility
Internet Technologies
74
74
Protocols
A form of standardised rules for
conversation
Popular type of protocol is the
request/response protocol (HTTP)
Protocols can be system protocols or
application protocols
Internet Technologies
75
The lifeblood of a distributed system are protocols. Already you have seen many
examples of system protocols. Many others exist on the Internet. They enable
clients and servers to interact with each other and are used to call on resources,
return resources or return status information.
75
An example of an application
protocol
FindCurrentBid
UpBid bid amount
DropOut
SendCurrentPrice price
Sold
Internet Technologies
76
The set of lines above represent an example of an application protocol which can
be used in an online auction system. Some of the protocol is used by the client,
for example to establish a bid; other elements of the protocol are used by the
auction server to send the result of a bid or the auction state
76
USER
PASS
STAT
DELE
RETR
Details of POP3
Internet Technologies
77
POP3 is a simple but heavily used protocol which is used for emails. A number
of elements of the protocol are shown above:
USER informs a POP3 server that the user is going to retrieve mail
PASS communicates the user password
STAT retrieves statistics on how many email messages are waiting for the user
DELE deletes an email message
RETR retrieves some email messages
77
78
78
Dedicated ports
7 ECHO
13 DAYTIME
21 FTP
23 TELNET
80 HTTP
25 SMTP
110 POP3
150 SQL-NET
443 SHTTP
Internet Technologies
79
Ports between 0 and 1023 are, by convention, reserved for dedicated services. A
list of some of these dedicated ports is shown above. For example port 80 is used
for communication between a Web client running a browser and a Web server
storing a series of HTML pages.
The normal programmer will be unaware of these port numbers and will not need
to know them, for example programming Web servers requires just a knowledge
of HTTP. However, if you are writing low-level applications it is important to
avoid dedicated ports.
SQL-Net is a network protocol used for communication between relational
databases and SHTTP is a secure version of the HTTP protocol.
79
80
80
status =computer1.getStatus()
Internet Technologies
81
All you need to know about Java for this course is that you can send messages to
objects and that often the result of this message is some data. The example above
shows the message getStatus being sent to the destination object computer1
with the status of that computer (running, stopped, malfunctioning) being
communicated back and set to the variable status.
This is a common pattern in object-oriented programming.
81
An example
Shows two sets of code: code for a client
and code for a server
Communication via the port 2500
Server located at penny.open.ac.uk
Client is anywhere
Internet Technologies
82
The code on the following pages establishes connection between a client and a
server. The client sends a message Hello to the server and received reply
Connection established. The server is the computer penny.open.ac.uk, the
client can be any computer on the network. All communication is via the port
2500
82
Client
// Set up the socket to the remote computer penny
Socket pSock = new Socket(penny.open.ac.uk, 2500);
//Obtain the streams
InputStream is = pSock.getInputStream();
OutputStream os = pSock.getOutputStream();
//Set up the BufferedReader which is
//associated with the socket
BufferedReader bf =
new BufferedReader(new InputStreamReader(is));
Internet Technologies
83
83
Internet Technologies
84
Next a print writer object is set up. In Java a print writer is used to write character
data. In the case of the client this data is written to the server.
Next the client sends the message Hello to the server and then reads the reply
that has been sent back. If the message was Connection established then the
server and the client are ready to talk to each other.
84
Internet Technologies
85
The first part of the server code for penny.open.ac.uk is shown above. It first
establishes an object known as a server socket on the server. This is bound to port
2500. It then stops waiting for a connection from a client. When a client comes in
two streams are set up. One is an input stream, the other is an output stream. A
buffered reader is connected to the input stream so that the server can read data
from the client.
85
Internet Technologies
86
A print writer object is then set up and a line read from the client. If the client
sent a Hello message then a connection has been established and the server
informs the client of this.
86
87
87
Enterprise frameworks
Microsoft
An introduction to J2EE
Internet Technologies
88
88
Facilities of an enterprise
framework(i)
Multi-language working
Support for legacy code
Support for high volume transactions
Support for messaging
Internet Technologies
89
89
Facilities of an enterprise
framework(ii)
Web server programming
XML facilities
Interface to standard protocols
Internet Technologies
90
90
Facilities of an enterprise
framework(iii)
Database connectivity
Naming services
Security services
Internet Technologies
91
91
Lecture 3
Distributed paradigms: introduction,
message passing and event-based
paradigms
Internet Technologies
92
92
Aims
To detail the various architectural schemes
available for developing distributed
architectures
To describe the main three tier concept used
in the course
To describe the idea that a technology
should hide an underlying network
architecture
Internet Technologies
93
93
Openness
Scalability
Specialisation
Reliability
Design flexibility
Internet Technologies
94
94
Disadvantages
Design complexity
Programming complexity
Performance problems
Internet Technologies
95
While the client/server paradigm is now the prevalent one it is worth detailing
some disadvantages. The first is that to achieve a high performance, design and
programming can be an immensely tough task: to reconciles a number of servers
with different performance characteristics which are remotely connected via slow
transmission lines requires design skills of the highest order, particularly when
high reliability is required.
95
Architecture
An arrangement of basic components and
their interaction with each other.
Four architectures studied here.
They are message passing, distributed
object, event-based architectures and tuple
space
Distinguished by their distance from the
network
Internet Technologies
96
This shortish lecture describes architectures. When one talks about a system
architecture or a system design what is being referred to is the arrangement of
building blocks and the connections between the blocks. This does not differ
from the conventional use of the term architecture in building.
There are four architectures I discuss in this part of the course. I have chosen
them since they exemplify two properties: the first is their popularity, the second
is their distance from the network. The term distance characteristics the level of
abstraction of the architecture; how much it hides the underlying hardware and
low-level software.
In the next three lectures I will look at two examples of a particular architecture
known as the distributed object architecture.
96
A statement
Internet Technologies
97
One of the aims of software developers is to isolate the details of the network
away from the programmer. This is best exemplified by the statement above
which forms part of the network strategy of Sun Systems, the original developers
of the Java programming language.
The ideal that is stated there is that there should be no difference between
programming a single computer and a number of computers connected together
implementing the same functionality.
97
Network
Computers
System configurer
and maintainer
Internet Technologies
98
The slide above shows the best that we can achieve in terms of moving towards
the ideal detailed in the previous slide. The programmer is unaware that he/she is
programming a number of computers. There is, however, another role: that of
system configurer who is responsible for ensuring that performance and
reliability goals are met. For large applications where traffic varies as
functionality changes this will be a continuous job.
98
99
Sockets and ports are examples of abstracting away from the network. A socket is
a logical entity it, for example, does not correspond to a hardware idea. When
sockets are established communication is via streams which are normally used
for input and output. At one level socket programming is very similar to
programming sequential file access.
Sockets hide the physical address of a computer (usually a symbolic name is
used) they also hide the transport details, for example there is no need to worry
about what to do if there is a hardware error, the TCP/IP software handles this
problem; however, the programmer does have to be aware that problems can
occur and cope with them.
99
Sockets in Java
ServerSocket ss = new
ServerSocket(venus.open.ac.uk, 1100)
Internet Technologies
100
100
Reality intruding
So far I have been talking about ideals.
Reality in term of performance and
application level error-handling often
intrudes
An example of this is concurrency and
socket processing
Internet Technologies
101
The model presented in the previous slides has been unreal, when you have a
number of clients accessing a server you will need to develop server code as
concurrent code.
101
Clients
concurrently
accessing
server
Service involves
lengthy wait
In
queue
Internet Technologies
102
Here a server carries out some lengthy process such as accessing a relational
database. A client connects into the server asks for a service and waits for the
service to be carried out. The service may take many milliseconds. In this time
another collection of clients may ask for the same service. In a simple-minded,
high-level program what would happens is that the system software will queue up
these clients until the client currently accessing a service finishes. This is very
inefficient: in the wait time, while the server processor is idle more clients could
be processed. In reality concurrency is used.
102
103
103
The models
Message passing
Distributed objects
Event-based architectures
Tuple or space-based models
Internet Technologies
Increasing
abstraction
104
In this lecture I shall be dealing with four basic architectural models: message
passing is at the lowest level and is usually implemented using sockets and server
sockets, distributed objects are an attempt to program a distributed system in such
a way that it can be viewed as a collection of interacting objects; event-based
architectures are based on a broadcaster/listener viewpoint, where listeners in a
distributed system are only activated when an event of interest to them occurs;
finally tuple spaces are the highest level architecture in that it views a distributed
system as a kludge of data.
All these models can be implemented in Java.
104
An overriding architecture
Storage
layer
Business
object
layer
Client layer
105
105
It separates concerns
It isolates application code
It protects a system from changes, for
example a change in the underlying storage
technology
Example of information hiding
Internet Technologies
106
There are a number of very good reasons for having a three-layer architecture
similar to that shown in the previous slide.
The first is that it does not intermingle code: database access code is not mixed
with application code which is not mixed with client code.
The second is that it minimises the effect of change in that, for example, all the
storage code is isolated in one layer so that changes to underlying database
technology does not over-affect the maintenance process.
The whole idea of a three-layer architecture is based on information hiding where
details of such things as client events, application objects and database access are
hidden beneath a layer of an API.
106
Internet Technologies
107
The code headers above represent the interface to stored data for an e-commerce
application. When the programmer uses this code he or she has no idea what the
underlying technology used for storage is: it could be a relational database
system, it could be an object-oriented database system or it could be a set of
distributed objects. Moreover, the programmer has no idea where the data is: it
could be on a local computer or one thousands of miles away. This means that
changes can be made to the underlying data without affecting much of the
application code that has been written.
107
Internet Technologies
108
The client layer will contain code which processes event such as button clicks or
a text field going out of focus, this is HCI event code.
It will also communicate the fact that an event has happened to a server via the
business object layer. How it does this depends on the technology used within the
system, it could, for example involve message passing.
108
109
The business object layer contains objects which are concerned with the
application, for example in an e-commerce application for selling CDs there
would be objects which represent individual CDs.
They are supported by application APIs which provide access to the underlying
data.
It also contains code which receives data from the server and displays that data.
109
A sales application
Supplier
Product
Invoice
Internet Technologies
110
Some typical business objects are shown in the slide for an application which
sells some set of products which are supplied by another company, for example
an online bookshop.
110
111
This is the layer in which raw data resides. Normally it is implemented in terms
of relational database technology. However, there is no reason why other
technologies cannot be used such as OO-based database technologies or even
simple flat files or transient data.
A three layer system should be designed in such a way that the technology is
completely isolated from the other layers in order that change does not impact in
a major way.
111
Internet Technologies
112
This form of architecture is based on sending and receiving messages from those
available in a protocol set. It is closest to the network than the other architectures
that I will describe in that it is based on sending serial data directly across some
transmission medium.
It is a very efficient way of doing things and is used when speed is of the essence.
It can be easily implemented using sockets and streams although, in practice,
concurrency is employed at the server.
It is also used in novel applications when there is no existing protocol available.
112
113
Message passing is mainly used in system protocols which are heavily used,
protocols such as HTTP and POP3. Most of these are fixed protocols in that the
whole repertoire is available to both the client and the server.
Some protocols can, however, be adaptive and are able to be modified prior to
being used in a communication link between a client and a server.
113
Adaptive protocols
Used in a number of contexts
Where a command has a variable number of
arguments
Where client and server have to negotiate a
subset of a full protocol set
Where a highly reliable system requires
modification of a protocol on the fly
Internet Technologies
114
A protocol can be fixed in that both the client and server use the same set of
commands over their interactions. However, protocols can be adaptive. An
example of this is where commands need to take a varying number of arguments
or a single argument which is of a different type.
Another example is where a client and a server negotiate a subset of a protocol,
for example the client may only be programmed to recognise an early subset of
the full protocol used by the server.
A further example is where a protocol is modified on the fly when it needs to be
changed. This usually happens in highly reliable systems.
114
Event-based architectures
Based on the concept of a broadcaster and
listeners
Based on the MVC architecture
Listeners register with the system in order
to receive data based on an event such as a
new user being logged on.
Internet Technologies
115
115
iBus
Listeners
Broadcaster
Communication bus
Listeners
116
116
Internet Technologies
117
This code starts up a broadcaster at the specified URL. Listeners can then listen
to events which are broadcast
117
118
This code constructs a message (posting) and places it on the bus which listeners
can register with.
118
Internet Technologies
119
This code just sets up a receiver object. Note that the definition of a
ReceiverObject is not shown here. It is implemented by inheritance from a class
Receiver.
119
Internet Technologies
120
This shows the subscribing process whereby a listener registers itself with the
bus. What is not shown is the code that is executed when an event occurs. This
would be found in the class Receiver.
120
An abstraction
mechanism
Distributed objects
Objects spread around a distributed system
Access to the objects is virtually the same
irrespective of where they reside.
A number of popular technologies: RMI, DCOM
and CORBA
Dealt with in much more detail in the next lecture
Introduction to distributed objects
Internet Technologies
121
121
Tuple architectures
Almost certainly the highest level view of a
distributed system
Academic roots
Original model Linda
Java implementation known as JavaSpaces
An introduction to JavaSpaces
Internet Technologies
122
Tuple architectures are the highest level view of a distributed system that we
have. It regards the system as a collection of data known as spaces; programs can
read and write data to these spaces with the system software providing an
interface from a high level view to the underlying implementation
The original model on which spaces architectures was based was a language
called Linda developed at Yale University. Linda was something of a curiosity
until Sun came along and implemented a version known as JavaSpaces as part of
its JINI effort.
Programming JavaSpaces is fairly easy; however setting up spaces is rather
convoluted.
122
Distributed system
Internet Technologies
123
123
An example
Internet Technologies
124
This code shows one of the small number of primitives used in JavaSpaces to
write data to a space. The second argument is null; this argument is part of the
transactional control facilities of JavaSpaces and is out of the scope of this
course.
124
Lecture 3
Distributed paradigms (ii) Distributed
objects
Internet Technologies
125
125
Aims
To describe the rationale behind distributed
object systems
To examine RPC as a predecessor
To detail the middleware required for
distributed objects
Internet Technologies
126
126
Simula
Smalltalk
C++, CORBA and Eiffel
Java, RMI, DCOM
C#
Internet Technologies
2001
127
127
Internet Technologies
128
Languages such as C++, Java and Eiffel allowed the programmer to specify
objects which just resided on a local computer. Distributed object technology
allows objects on remote computers to be accessed in the same way. This leads to
the concept of access transparency (see next slide plus 1)
128
Information hiding
Aggregation
Inheritance
Internet Technologies
129
129
Internet Technologies
130
The major rational behind having distributed object schemes is that it provides
consistency across applications, the fact that a system is viewed as a collection of
co-ordinating objects, irrespective of whether the system is distributed or not
means that the same techniques, skills and software engineering tools can be
used.
This is an example of distributed technologies hiding the sometimes awful detail
that lurks in a distributed system.
130
Transparencies (i)
These are all part of the
process of hiding network details
Access transparency
Location transparency
Migration transparency
Replication transparency
Internet Technologies
131
131
Transparencies
Concurrency transparency
Scalability transparency
Performance transparency
Failure transparency
Internet Technologies
132
Concurrency transparency means that the fact that the concurrent code is being
used to access an object should be hidden from both users and programmers
Scalability transparency means that when the load on a distributed system
increases that more processing power is added to the system without the user or
the programmer being aware of it.
Performance transparency means that it is invisible to the user. For example if
load balancing is used this mechanism should be hidden
Failure transparency means that when a failure occurs the fact that it has
occurred should be invisible to the user.
132
133
Distributed systems are not new: they have been around for twenty years. It is
then hardly surprising that remote code has been around for some time, before
even distributed objects had been developed.
One of the most sophisticated systems was the Distributed Computing
Environment (DCE) which implemented a technology known as remote
procedure call. This allowed subroutines on remote computers to be executed as
if they were on a local computer. This requires a special language known as an
Interface Definition Language to define the facilities offered by a subroutine. The
idea is used in the CORBA distributed object scheme which is discussed later.
Remote Procedure Call has not gone away. It is still alive in the form of SOAP an
XML based form.
133
Internet Technologies
134
The major components of a distributed object technology are shown above. The
interface definition language defines the objects that are to be remotely located.
The presentation layer provides proxy objects on the client and server to which
messages can be sent. The session layer handles multiple objects and maintains
and organises the connections between the objects and the transport layer. The
transport layer uses some base protocol (usually TCP/IP) to carry out the sending
of data to a remote object and the reception of that data back at the calling
computer.
134
The components
Objects defined
by IDL
Server
Client
Real calls
Presentation layer
Session layer
Transport layer
Internet Technologies
135
135
Presentation layer
Internet Technologies
136
The presentation layer communicates with the session layer by passing uniform
representations of data used in method calls to it. For example an object which
represents an employee would be transformed to a set of bytes which represent
the employee. This is the process known as marshalling. The reverse that happens
at both the server and at the client is known as unmarshalling.
In order for a distributed object system to work a local object acting as a proxy
needs to be maintained at the client end. This object is the one to which method
calls are made. It carries out the marshalling and unmarshalling process,
eventually passing the data associated with a method call to the session layer.
136
Session layer
Receives uniform data from presentation
layer and passes it to the transport layer
Maps object references to hosts
Implements activation policies
Provides object adapters
Invokes the requested method
Synchronises client and server objects
Internet Technologies
137
The session layer maps an object reference such as a string name to some
transport layer data used for identification, for example a host name, port name
and object name in TCP/IP. It thus implements a naming service
It receives data in a uniform form from the presentation layer and processes it and
then instructs the transport layer to send it forward to the server containing the
distributed object.
Activation launches a previously inactive object and deactivation is the reverse: it
terminates the execution of the object. This is one of the facilities offered by a
part of the session layer known as the object adapter.
The layer also invokes the requested method on the remote object and
synchronises the process of message sending between remote objects and clients
and ensures that there are no problems with inconsistent updates and lock-outs.
137
Transport layer
Internet Technologies
138
This is the lowest level of a distributed object scheme. It is the layer that carries
out the actual transport of marshalled data representing a method call
encapsulating data such as method arguments and method name in a byte stream.
Any communication protocol can be used, in most implementations of remote
object schemes, such as CORBA and RMI, TCP/IP, the Internet protocol is
employed.
138
Design
Server code
generation
Server
coding
Client code
generation
Client
coding
Server
name
registration
Internet Technologies
139
139
Life-cycle issues
Object reference issues
Request latency
Object activation
Parallelism
Communication
Failures
Internet Technologies
Security
140
The remaining slides look at the issues above, such issues are relatively small
when dealing with local objects communicating among themselves. However,
when distributed objects are involved major problems ensue.
140
Life-cycle issues
141
Since the normal code for object creation cannot be used (constructors) code has
to be written which carries out the remote creation, this is then called by the
client.
The creation has to be done in such a way that when an object moves there is no
need to change the code of any applications that use the object.
Local object schemes use garbage collection when an object is no longer
referenced by other objects and its memory reclaimed. This is very difficult to do
in a distributed environment for performance reasons. Consequently many
distributed object schemes do not guarantee referential integrity and hence the
client design has to cater for the condition that an object may have disappeared.
141
Object references
Internet Technologies
142
You must always design distributed systems on the assumption that references
are heavy in terms of memory. A typical piece of data is shown above for ORBIX
a lightweight implementation of CORBA, for other implementations this can be
much bigger.
142
A warning
Internet Technologies
143
143
A design technique
One way to minimise the use of distributed
objects is to have a single distributed object
that acts as a factory object at the server, this
instantiates local objects which carry out the
processing required
Introduction to patterns
An example of a
design pattern
Internet Technologies
144
When there is a need to create a large number of objects at, say, a server, you
should not create all of them as distributed objects, the traffic to those objects
would swamp the application. One strategy is to have a single distributed object
which is manipulated by clients to produce the objects as local objects. This
object acts both as a gatekeeper and a factory.
144
Sophisticated approach
Object factory
Many remote
objects
145
145
Request latency
Internet Technologies
146
Emmerich has measured the response when two objects communicate between
two ULTRA Sparc servers in a 100 Mbit network as 500 microseconds. This is
2000 times as long as local method calls.
Designers need to be careful in object placement when designing for
performance.
146
147
147
Failures
Distributed systems fail more often than
centralised ones
Failure should be programmed and designed
for
Middleware often imposes an exactly once
condition which is resource heavy
At most once usually implemented
Internet Technologies
148
You need to design distributed object systems in such a way that they cope with
failure for example by employing data replication techniques. The middleware is
able too impose an exactly once semantics where every request is guaranteed to
be executed once and and only once.
Unfortunately this gives rise to performance and resource problems and an at
most once semantics is usually applied. This means that they apply a request at
most once and tell the client when a failure occurs. This means that the client
should be programmed to respond to such failures.
148
Security
Centralised applications deal with security
at the session level via techniques such as
authentication procedures.
Distributed object systems are prone to
network attacks.
There is a need for a deeper level of security
on a request by request basis
Internet Technologies
149
149
Distributed objects
Objects which reside on any computer in a
distributed system and to which messages
can be sent
A number of schemes: RMI, DCOM and
CORBA are the main ones
Hides the transport mechanisms from the
programmer
Internet Technologies
150
150
The future
Distributed objects were once very
important, they are less important these
days with the growth of web services.
However, the CORBA technology
described in the next lecture offers major
advantages in interworking.
Internet Technologies
151
Lecture 4
Distributed paradigms (iii) RMI and
CORBA as examples of a distributed
object technology
Internet Technologies
152
152
Aims
153
153
A Java technology
Lightweight object technology
Initially Java-centric
Now with links in to CORBA
Efficient compared with CORBA
Internet Technologies
154
In this lecture I will look at a pure Java solution to distributed objects known as
RMI. It was an early part of the Java product set. It was a pure Java solution in
that it could only communicate with objects developed using Java; this has
changed in that RMI now has hooks into CORBA.
RMI objects are defined in Java and are placed on remote computers and are sent
messages to. RMI is an efficient technology with a none too steep learning
incline.
154
Stubs
Skeletons
Server
Clients
Internet Technologies
155
155
Internet Technologies
156
156
Data layer
Presentation layer
Business object
layer
Internet Technologies
157
The slide shows how RMI objects can be used in the middle layer of a three-tier
application. The objects can reside on a number of servers, a single server or even
the database server that implements the data layer.
157
Internet Technologies
158
When developing an RMI system you need to develop the server code. This code
will set up the remove objects and implement the functionality in the methods.
The client code then needs to be developed; this will normally contain visual
objects and message code for the remote RMI objects at the server.
The client code will then need to be deployed. Classes will need to be dispensed
to client sites. This can be done in a number of ways: statically by sending as a
file transfer or dynamically using a Web server.
In order for the system to work the RMI naming system needs to be started. This
is known as the RMI registry and enables objects to be symbolically referenced.
When all these steps have been completed the clients can connected into the
server and call on the service it provides
158
import java.rmi.*;
public interface SecondGenerator extends Remote
{
long getMilliSeconds() throws RemoteException;
}
Internet Technologies
159
The code here defines a Java interface. This is a class with a template for code
that needs to be provided by any class that implements the interface.
The interface extends a class called Remote. This informs the Java runtime
system that the objects generated from this interface are going to be remote.
There is only a single method within the interface for which code needs to be
provided. The code throws an exception if there is any problems accessing the
remote object
159
160
160
Internet Technologies
161
161
162
This is the code that sets up the remote object. It consists of a number of
processes:
First the security manager is loaded in and started. Next a remote object is set up
with the name Dater.
The objects name is then communicated to the RMI naming service using
Naming.rebind.
a message is sent to some console window that the object is ready to be sent
messages.
The server will now wait for some messages
162
Internet Technologies
163
163
Internet Technologies
164
In a remote application using RMI not much of the code betrays the fact that
distributed objects will be used. A small amount of server code contains, for
example, reference to the naming service. However most of the server code will
be concerned with functionality.
The only code at the client that betrays the remote nature of the application is the
reference to the RMI Registry (the RMI naming service)
164
Internet Technologies
165
For the client the only code that betrays the fact that an object is remote should
be code which obtains a local reference to the remote object via the naming
service. This is normally a single line. All the code after this line would just send
normal messages to the local proxy object (in the slide above this is sgen)
165
RMI
RMI is a lightweight distributed object
technology
Based on Java
Relatively simple to use
Fairly efficient
Can be used for both the static and dynamic
creation of objects
Internet Technologies
166
166
CORBA
A standard
Lots of implementations
Multi-language approach to distributed objects
Based on an interface definition language
Mature
Some reliability problems with some products
Introduction to CORBA
Different to RMI
Internet Technologies
167
CORBA is a distributed object technology much like RMI is. However, there are
some major differences. The two major differences are that CORBA is a multilanguage approach in that a wide variety of languages have CORBA interfaces
and that distributed objects are defined by a special purpose language known as
an interface definition language (IDL).
The technology is mature in that interfaces exit for a wide variety of languages
including older ones such as Ada, LISP and FORTRAN.
167
Internet Technologies
168
The major difference between CORBA and RMI is the use of an interface
definition language which describes the services that are provided by a
distributed object. For CORBA this language looks almost identical to C because
it has to mirror both procedural and OO languages. Because of this it does not
include esoteric facilities such as inheritance.
168
CORBA is a standard
Defined by the Object Management Group
The OMG is the largest standards group in
the world
Almost certainly the most academic
The CORBA standard is huge and attempts
to cover everything
Internet Technologies
169
169
Legacy
code
Ada
code
Clients
C++ code
Java code
Internet Technologies
170
Here CORBA objects are used as gatekeepers. These are objects which are a
front end to existing code. The diagram shows a system which has been
programmed in four languages two of which are not object-oriented. In order for
clients to communicate with this code a CORBA object programmed in the
language is placed between the client and the code.
The clients see a clean object-based implementation even though the code behind
the CORBA objects is purely procedural.
170
Internet Technologies
171
The figure above is a logical view of the CORBA architecture. Each of the
components will be detailed in the next two slides. The figure shows a client
interacting with CORBA object residing on a server.
171
Internet Technologies
172
172
Internet Technologies
173
Static IDL skeletons. These are the server side equivalent of the client
IDL stubs. It is code which carries out a number of functions such as
extracting out the arguments from a remote method invocation and
carrying out the actual process of sending a message to the remote object.
These skeletons are implemented by the utility which creates the client
IDL stubs.
Dynamic skeletons. These are equivalent to the static IDL skeletons.
However, they enable clients to access remote objects for which the
clients do not have compile-time knowledge.
Interface repository. This is a database of all the object descriptions
expressed in IDL.
173
Internet Technologies
174
Object Request Broker. The ORB is the part of the CORBA architecture
which provides the plumbing between distributed CORBA objects and the
clients that reference them. It is the ORB which carries out the process of
communication between distributed objects and it is the ORB that
communicates with the transport medium used to convey the raw data
used in object communication. There are a number of different ORBs
developed by software vendors. In the early days of CORBA these ORBs
were not compatible with each other: you could not send a message from
a client which had stubs generated by one ORB vendor to a server object
which had skeletons generated by another ORB vendor. However, version
2.0 of the standard specifies that all ORBs should be able to communicate
using an Internet Inter Orb Protocol, usually abbreviated to IIOP.
Object adapter. This is a layer which enables a remote object to access
the facilities of the ORB.
174
Internet Technologies
175
The Life Cycle Service. This provides facilities for creating, copying,
transporting and deleting objects.
The Persistence Service. This provides facilities whereby objects can be
stored on some permanent medium including relational databases, objectoriented databases and >at <les.
The Event Service. This allows objects to register themselves as listeners
to events and respond to events; for example, an object might register
itself as a listener to an event which occurs when another object changes
one of its instance variable values and carry out some processing when
this occurs. This service also allows objects to de-register themselves
from events.
The Naming Service. This allows objects to be given names and located
by other objects which quote the name.
The Concurrency Control Service. This provides facilities which ensure
that concurrent processes are not allowed to access an object in such a
way that the object is left in an inconsistent state.
175
Internet Technologies
176
176
Internet Technologies
177
177
The IDL
Object
services
specified in
the CORBA
IDL
Convertor
Internet Technologies
Object definitions
expressed in some
base language
178
178
Internet Technologies
179
The IDL fragment above defines a module Tester which contains an interface
Single. The interface provides a single service returnsVals and is associated
with two attributes (instance variables, fields) which are strings. One of the
strings location can only be read, it cannot be written to.
The service returnsVals has a single argument point that is a string which is
just read, it is not written to.
The fragment above is vary similar to a class definition and can be easily
translated into such a definition.
179
Internet Technologies
180
Here the IDL from the last slide has been translated into an interface contained in
a Java package. The exname attribute has been translated into a constructor and
setter method. The constructor would be used to create exname objects and the
setter sets the value of such objects. The location attribute is just associated
with a method which returns its value since it is readonly. Finally a method
returnsVals implements the final facility in the IDL fragment shown in the
previous slide.
180
Attributes in CORBA
Wide variety of attributes in CORBA
Examples include double, float, long, union,
boolean, enum, char
Sometimes the target language does not
support the attributes; in this case some
hack is used
Internet Technologies
181
There are a large number of attributes in CORBA, some examples are shown
above, most are self-explanatory. enum is an attribute which can have a number
of distinct values and struct is something akin to a record in that it can contain a
number of attributes.
When the target language does not contain an attribute a hack has to take place.
For example Java does not contain anything akin to an enum. In this case the IDL
is translated into a number of Java constants and methods created which access
these constants.
181
struct
sequence
array
enum
Internet Technologies
182
182
An application architecture
Client
code
Server
code
Stub
code
Skeleton
code
ORB
Transport mechanism
Internet Technologies
183
The diagram shows a CORBA architecture. Client and server communicate via
stub code and skeleton code. Stub code is code which implements a proxy object
on the client and skeleton code implements the same at the server. The proxy
objects communicate via the Object Request Broker which, in turn,
communicates with some set of transport protocols, normally these are TCP/IP.
183
Internet Technologies
184
The previous slide exemplifies a point made at the beginning of the lecture: that
because CORBA has to deal with a multitude of target languages it cannot
contain any very sophisticated facilities which might not be easily implementable
in a language such as C or Ada.
184
185
185
CORBA vs RMI
CORBA is somewhat
more complicated to
program
CORBA implementations
are slower than RMI
CORBA is multi-language
CORBA offers more
services
Simple objects
Internet Technologies
186
The slide above describes the main differences between RMI and CORBA.
CORBA is very much more feature rich but is less efficient than RMI.
186
Speed
Programming complexity
Degree of platform independence
The degree of language independence
The degree of complexity of objects created
Internet Technologies
187
187
Summary
188
188
Lecture 5
Servers, database servers and
development technologies
Internet Technologies
189
189
Aims
Internet Technologies
190
This lecture will look at client/server computing in a little more detail. It will
look at the rationale for client/server computing. It will detail some of the more
important types of server, look at the role of middleware. Much of the lecture will
concentrate on database server; next to Web servers they are the most important
type of server used in distributed applications.
190
Internet Technologies
191
A server is a computer (or program) which provides some service. For example a
print server will react to print service requests by initiating print jobs.
A client will ask for a service, for example a news-reader program will ask for a
set of postings from a news server.
The distinction between clients and server is not clear cut: a server can act as a
client to another server; for example a Web server, in order to carry out its
service may may act as a client to a database server.
191
Some servers
File servers
Web servers
Mail servers
Print servers
Database servers
Groupware servers
Object servers
Application servers
Internet Technologies
192
192
Storage
layer
Business
object
layer
Client layer
193
193
Customer
Book
Order
Review
Back order
Internet Technologies
194
The objects above are some of those which are associated with a book store such
as Amazon. They are distinct entities which are used for data storage and are
accessed and updated by application programs. Typical data which might be
stored in a customer object includes: the name, email address, list of past orders
and credit card details.
194
195
Another quite popular architecture is the two layer architecture which contains a
layer which has all the visual objects and processing embedded in it and a data
layer which contains permanent data. It is used where there is little processing
and simple functionality; the World Wide Web is a good example of a two tier
architecture: it works because there is little processing required at the browser
end, just the display of pages.
Such architectures are problematic when they contain a lot of functionality which
changes: the process of versioning and broadcasting updates becomes very
complex.
The adjectives fat and thin are used to describe the amount of code which resides
at either the client or the server
195
196
196
Middleware
Software that is interposed between client
and server
Two types: system and service middleware
An example of middleware is the software
that interfaces a browser and the Web.
Internet Technologies
197
Clients and servers do not talk to each other directly: interposed between then is
middleware. There are two types of middleware: system middleware carries out
general, system-level tasks such as transporting raw data around the Internet.
Service middleware is associated with a particular task or service, for example
the middleware which allows a client to query a database or allows a news reader
to interrogate a news server.
197
Internet Technologies
198
All the examples above are of general middleware. They are not tied to an
application but are used by applications.
198
199
All the examples above are of service middleware since they are all associated
with a particular service: a database service, a newsgroup service, a distributed
object server service and an email service.
199
Programmer
sees only one file
Replication middleware
Computer 1
Main file
Computer 2
Duplicate
file
Internet Technologies
Computer 3
Duplicate
file
200
200
Message-oriented middleware
Manages the transactions that pass between
a client and a server and vice versa
Normally queue based
Model of interaction is small
Often used for connecting to legacy
software
Internet Technologies
201
201
Client adds
and removes
server
messages
Internet Technologies
202
202
203
203
Database servers
Wikipedia on databases
Internet Technologies
204
Database servesr mediate access to a series of databases. There were a rich set of
underlying models for database servers; virtually the only model left is known as
the relational model. It hold data in the form of tables.
Tables are queried and updated using a standardised language known as SQL
(Structured Query Language)
A database server lies behind the object layer in a three tier architecture
204
A relational table
ItemId
Item
NoInStock
Aw222
Washer A
300089
Ntr444
Nut A
2009
Wdt675 Widget Q
300001
Bt56ww Bolt A
200
Bt5556q Bolt B
200009
Internet Technologies
205
This is an example of a small simple table. It holds records which specify the
stock levels in a warehouse. Each column describes a set of similar data. One
column is designated as a key which uniquely identifies each record.
A relational database consists of a number of these tables interlinked by common
data items.
The tables themselves are stored as files.
205
Selection condition
Internet Technologies
Name of table
206
This is an example of an SQL statement which creates a two column table with
columns EmployeeName and Salary by selecting those rows of the table
Employee which contain employees whose salary is greater than 45000.
This is an example of a retrieval query, queries exist for other processes such as
creating tables, deleting tables and deleting rows.
206
Internet Technologies
207
207
Internet Technologies
208
To detect and act upon deadlock. Deadlock occurs when one user
transaction has got exclusive access to a resource such as an SQL table
and is waiting for a resource which is held exclusively by another user
transaction; however, this second transaction is unable to proceed because
the first transaction has exclusive use of another resource that the second
transaction needs to proceed. A database server will detect such serious
conditions and remedy them in a drastic way, often by terminating one of
the user transactions; happily the termination is normally followed by the
re-execution of the transaction by the server, when usually the first
transaction has proceeded and has released the resource that the second
transaction was blocked on.
To administer security. A good database server will ensure that no user is
allowed access to a database who has not been authorised.
To administer backup and recovery. There are two aspects to this. A
database server will keep a log of transactions which is used to recover a
database when some large problem occurs such as a gross system failure.
This log keeps details of the transactions against tables and which parts of
them were affected. When a problem occurs the recovery facility of the
server will find a copy of the last saved version of the database and then
reapply all the transactions held in the backup log.
208
Referential integrity
Relational tables are consistent with each
other
Implemented via triggers
Associated with business rules which
govern the values that business objects
attain
Internet Technologies
209
The term referential integrity refers to the fact that tables in a relational
database are consistent with each other. The following examples are of databases
whose tables do not have referential integrity:
A customer is associated with a transaction which does not occur in a table.
A part stored in a warehouse that has no suppliers associated with it.
A supplier has been given a new reference number, yet the old reference number
of that supplier can still be found in other tables.
209
210
210
Relational middleware
SQL
API
Clients
Driver
Server
software
Database
server
Stacks
Internet Technologies
211
211
Distributed databases
212
A distributed database is a database which has its data spread over a number of
servers. There are a number of reasons for this:
Performance, for example by keeping a subset of the data close to the users a
distributed system will not experience any transmission delays.
Reliability, by duplicating data across a number of servers a failure of one of the
servers will only result in a performance degradation rather than loss of service.
Pragmatic legacy reasons: systems often consist of data which has been
gradually added at different locations
212
Replication problems
Concurrent access
Security
Reliability
Clock synchronisation
Internet Technologies
213
213
Continuous operation
Transparency
Replication independence
Mixing servers
Operating system independence
Optimisation of queries
Internet Technologies
214
Chris Date, who was one of the pioneers of relational technology, has
devised 12 rules which should be used to judge the effectiveness of a
distributed database technology. Six are shown below:
Continuous operation. A distributed database system should run
continuously; all maintenance operations should be applied to it while it is
running.
Transparency. The programmer or user should not be aware that a
particular database is distributed.
Replication independence. The programmer or user should not be aware
of the fact that a database has been replicated.
Mixing servers. It should make no difference to the development of a
system that a number of disparate servers are used. You should be able to
freely use and interchange database servers.
Operating system independence. The operating system used in a server
should make no difference to the system.
Optimisation of queries. A database servers query engine should be
aware of the distribution of data and be able to make decisions about the
way data is to be retrieved based on the location of the data.
214
Types of distribution
Downloading
Data replication
Horizontal fragmentation
Vertical fragmentation
Internet Technologies
215
215
Programming a database
A number of APIs available
Need to provide facilities for retrieving,
updating and deleting data.
Facilities for connecting to a database
Facilities for querying a database
Facilities for processing results from a
query
Internet Technologies
216
There are a number of APIs which are available for programming a relational
database. They should provide a means whereby a database is connected into a
program, queries issued against the database and the processing of the resultant
data sent back from the database server.
216
Internet Technologies
217
The Java SQL API is shown above, it contains the following classes:
Driver. This is a class associated with the database driver that is used to communicate with a
database.
Statement. This class is used to create and execute SQL statements.
PreparedStatement. This class a subclass of Statement is used to develop SQL statements
which have an increased efficiency when executed a number of times with different
arguments.
CallableStatement. This is a subclass of Statement which provides the programmer with the
facilities for calling stored procedures.
Connection. This is the class which contains facilities for connecting to a database.
ResultSet. When an SQL statement is executed a result set is usually returned.
ResultSetMetaData. There are a collection of classes which provide data about the main
entities that this package manipulates.
DatabaseMetaData. This is another metadata class. In this case it provides information about
a database.
DriverManager. This is a class that manages the drivers that are available for connecting to a
database.
DriverPropertyInfo. This class is not used by application programmers. It contains a number
of instance variables which are used by drivers in order to connect into a relational database.
217
218
218
219
The remaining slides detail a simple program which issues a query against a
database.
This part of the program imports the jdbc library and loads a driver. If there are
problems with loading a driver then execution terminates.
219
Internet Technologies
220
This part of the program obtains a connection to the database, creates an SQL
statement and executes it. The statement obtains the name and salaries of those
employees in the table employees who have a salary greater than 3500.
The result of the query is placed in a ResultSet object which is traversed in the next
section of program.
220
221
This part of the program traverses the result set formed from the query. It obtains
the first column content (the employee name) and the second column content (the
salary) from each element of the result set and displays them.
Finally the statement, connection and result set objects used are closed down.
221
Internet Technologies
222
This final piece of code is executed if an error occurs, for example if the server is
down. The method stackTrace provides extra diagnostic information.
222
Internet Technologies
223
When developing a three tier architecture using a relational database as the third
tier there is a need to map business objects such as Warehouse, Product,
PlaneSeat and Passenger to their relational table equivalents.
This can be done by hand by developing classes such as Product and inserting
retrieve and updating SQL code within the methods used. However there are a
number of modern tools which enable much of the effort to be automated. The
best known is TopLink.
223
Some technologies
Internet Technologies
224
The remainder of this lecture will look at technologies such as PHP. ASP.Net,
Ruby, the Ruby web application framework Ruby on Rails, application servers
and integration servers.
224
PHP
Internet Technologies
225
PHP is a language that is used to insert text into HTML documents that carries
out some dynamic processing. It is a venerable technology dating back to the
early days of HTML and UNIX. The next slide shows an example.
225
An example of PHP
<?php
if (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== FALSE)
{
?>
<h3>strpos() must have returned non-false</h3>
<p>You are using Internet Explorer</p>
<?php
} else {
?>
<h3>strpos() must have returned false</h3>
<p>You are not using Internet Explorer</p>
<?php
}
?>
Internet Technologies
226
226
ASP.Net
A technology associated with the .Net
framework developed by Microsoft.
Involves embedding processing instructions
within a web page.
Such instructions can access databases,
produce forms, access other web servers.
ASP statements are executed on the web
server.
Internet Technologies
227
This is a technology similar to the Java Server Pages technology and PHP. It
allows the programmer to insert processing statements into an HTML and
effectively embeds processing code within the HTML files.
227
Ruby
Had a longish history
Available for free
Everything is an object in Ruby, looks like an
amalgam of many languages including Smalltalk
Links to many Internet technologies including
XML, XML-RPC, Spaces, message-oriented
middleware etc.
Object-oriented
Dynamically typed language
Interpreted
Internet Technologies
228
Ruby is a language that is quite old. It lay dormant for a number of years.
However, the last couple of years has seen a huge increase in interest in the
technology. This has mainly been due to the emergence of the web development
framework Ruby on Rails which has been reported as having given developers
huge increases in productivity. In some quarters it is seen as a successor for Java
whose APIs are being seen as being over-complex and bloated; this is probably
somewhat optimistic.
228
Method
Method call
Internet Technologies
229
This shows one of the strongest features of Ruby the ability to execute a chunk of
code with another chunk of code as an argument. What the call does is to
effectively execute the code
puts start of example
puts middle of the example
puts end of example
229
Ruby on Rails
Web development framework
Used for client, business object, database
systems
Used for ecommerce, for example systems
which employ a shopping cart
Employs the reflection facilities in Ruby
Internet Technologies
230
What has made Ruby an important programming language is the fact that it has
given rise to one of the most productive web application frameworks. This is
known as Ruby on Rails. The framework relies on the fact that Ruby has the
facility to interrogate its own programs, for example a program can discover what
the name of the methods that it has. Ruby on Rails is highly productive because it
concentrates on 80% of ecommerce systems: those that can be organised as threetier architectures and which involve common e-commerce functions
230
Application servers
Host application objects
Best known example are those servers
which host Enterprise Java Beans, for
example the BEA WebLogic server
Enables objects to be exposed to developers
and hide the underlying database details
Internet Technologies
231
231
Integration servers
Act as a hub in an integrated system
Coordinate and orchestrate the flow of data from
one component of an integrated system to another
Carry out activities such as marshalling, data
formatting and data transformation.
Increasingly important as integration becomes a
much more important paradigm.
An example of an integration server
Internet Technologies
232
An integration server is a server that sits in the middle of a system that has been
integrated by bringing together a number of pre-developed components. Some of
these components may not be able to work with each other directly, for example
they may have different data specifications, interfaces and protocols. The role of
an integration server is then to carry out the mediation that is necessary to realise
inter-working. Later in the course I shall look at integration in more detail.
232
Lecture 6
Web Servers
Internet Technologies
233
233
Aims
234
In this lecture I shall be looking at almost certainly the most important type of
server: the Web server. In it I shall be looking at the basic processing cycle used
by a Web server and examining the HTTP protocol that is used by a client and a
Web server to communicate.
Early Web technologies just dispensed static pages; soon this was regarded as
very limiting and so a number of dynamic page technologies were developed; in
the lecture I shall look at some of these.
Apache is almost certainly one of the most popular Web servers; I shall be briefly
looking at it and using it as a case study.
Finally I shall look at the role of the Web server in distributed architectures.
234
235
235
236
The slide details the processes involved in responding to a browser request for
some resource. The program that carries out this process is known as HTTPD
(HTTP Daemon). A browser requests some resource, the daemon parses the
request to find out what is required, gets the resource and then sends it back to
the browser. If there is a problem then an error is signalled to the browser.
When a request is completed the connection can be closed and the processing
cycle continued.
236
237
HTTP is the protocol used for communication between a client running a browser
and the Web server. It is very simple in concept although it uses quite a number
of commands and arguments.
It is an example of a request response protocol: every request issued by a browser
elicits a response from a Web server.
It suffers from a major problem in that it does not remember state. This is a major
problem as for example ecommerce applications require state to be tracked. The
best example of this is the shopping cart.
There are three versions of HTTP, no browser ever uses version 0.9 now.
237
GET
/index.htm HTTP/1.0
Internet Technologies
238
This is an example of the GET command, this is the most popular command in
the HTTP protocol set; it just asks for some resource to be sent back. The
resource in this case is a file containing the HTML for a home page. There may
be a number of files associated with this home page, for example graphics files;
the server will determine which are associated and send them back as well.
238
239
This is an example of a command which is associated with forms. When you fill
in details in a Web form this command is sent to the Web server. It informs the
server that a program in the directory cgi-bin is to be executed and defines the
data that is to be processed by this program. The program will usually access
some file-based resource, for example files holding relational databases
maintained by a database server.
239
Headers
Internet Technologies
240
This informs the browser that its request has been successful (200 is a status code
which indicates this). It then sends back a number of headers which provide
information about the server and the resource that has been sent back. For
example the type of the server and the fact that HTML code expressed in plain
text has been returned.
After a line feed the resource is sent back.
240
Internet Technologies
241
241
Internet Technologies
242
The slide above contains the ranges of the status codes, some examples are
200
OK
301
server
404
503
Service unavailable
242
243
243
Internet Technologies
244
244
HTTP is stateless
No memory between transactions
Severe constraint
A number of solutions including cookies
and URL rewriting
Often the solution is transparent to the
programmer
Introduction to cookies
Internet Technologies
245
One of the problems with HTTP is that there is no state memory between
transactions. This is a severe problem since many applications require memory of
previous page requests, for example an ecommerce application might require to
keep track of the identity of a user and what actions he/she has carried out.
There are a number of solutions to the problem. A common one is to keep data on
the client. This data is known as a cookie. Another solution is to send modified
URLs between the client and the server with the URL being built up of previous
state changes. Another solution involves invisible pages. This is explained in a
later slide.
245
Cookies
The storage of memory on the client to keep
track of state.
Example is the sequence of transactions
which take place in an e-retailing site.
Cookies can be temporary or permanent.
Browser can refuse cookies
Internet Technologies
246
Cookies are a common solution to the problems of state tracking. It is data kept
on a client which keeps track of previous transactions or contains data which can
be reused time and time again over sessions. Cookies can hence be temporary or
permanent: they can disappear after a session.
Cookies are a security problem and hence browsers can be configured to reject
them.
246
Invisible pages
Involves the user sending forms data to the
server, for example an item bought.
The server sends back HTML containing
hidden elements which represent state
details
Gradually a page is built up which contains
the state
Internet Technologies
247
Another technique that is used to track state is to use hidden form elements. The
way that this works is as follows:
The user sends data to the server, for example the name of an item that has been
bought.
The server responds with a page which contains confirmation and asks the user
whether he or she want to purchase more. This page contains invisible elements
which describe the built up state.
If the user says yes a purchase page with the same invisible elements is sent
back.
The user chooses another item and the server responds with another page with
this item specified as an invisible element
This continues until check-out.
247
Internet Technologies
248
Here a page sent back from the Web server is shown. It contains a number of
invisible elements which contain data about three past transactions. This data is
not shown on the page. When the user makes another choice the data is sent back
to the server with the new choice made by the user.
248
Hide1=MickyBook23566
Hide2=DumboDoll23337
Hide3=Alysband14556
ItemSelected=BarbyDoll
Past choices
Current choice
Internet Technologies
249
This slide shows the data sent back by the client. It has received the current state
via a page similar to that displayed on the previous slide.
The user has augmented the state by making a choice. This has been added to the
current state and has been send by a POST command.
When the server receives the data it will augment the BarbyDoll item with
numeric data which represents price etc.
249
Internet Technologies
250
There are a now a host of technologies used to program Web servers. In the next
slides I will look at each of the main technologies:
Server page technology involves embedding code within a Web page.
Servlets are in-memory programs written in Java.
CGI programming with a scripting language involves the programmer accessing
CGI environment variables which hold data on a Web transaction
Applets and Active X controls are code which interact with a browser.
250
Internet Technologies
251
Server side includes are commands inserted in a Web document which are
interpreted by the Web server which will carry out some action such as
substituting some text. It can also specify that a program is executed and its
output sent back to the user.
Server side includes were a very early and quite primitive attempt to overcome
state problems.
251
Servlets
Java technology intended to replace CGI
programming
Employs servlets specified in a Web page.
Efficient solution.
Respects OO conventions
Now mature, superceded to some extent by
JSP (Java Server Pages)
Internet Technologies
252
252
Internet Technologies
253
This and the next slide describe the code from a simple forms servlet. It process a
request and produces some response. The first thing that the code does is to find
the value of some HTML forms objects, say text boxes; if these are empty then
the user has made an error and some error page is sent back to the user.
253
Internet Technologies
254
Here the rest of the code is shown. This is the code that is executed if the user has
typed in correct data. A writer is obtained which connects to the client browser.
This writer is then used to send back some HTML simple message involving
paragraph breaks ( <P> )
Servlets are a technology which handles state tracking very easily and in a way
that is invisible to the programmer
254
Internet Technologies
255
Here a simple program is displayed which keeps track of the number of times a
Web page has been accessed.
The first part of the program displayed above will read a data item $data from a
file, increment it and then write it back to the file. The data item represents the
number of accesses made to the Web page.
The program is taken from E-business and E-commerce, How to Program by
Deitel, Deitel and Nieto, Prentice-Hall.
255
Internet Technologies
256
This code prints the count by extracting each digit from it and then displaying as
graphic corresponding to the digit. (Note that to print a character you need to
precede it with a \ character.
256
Internet Technologies
257
Both the technologies here are similar. They are both code which are written in
some language such as Java or Visual Basic. A reference to the code is placed in
a Web page. When the page is downloaded to a browser the code of the applet or
Active X control is also loaded in. The code is then executed and carries out
some processing, for example forms processing.
Because code is loaded into a client computer there are potential security
problems. Java gets over this via a sandbox approach, while Active X controls are
digitally signed. These approaches will be detailed later in the course.
257
258
258
A simple example
<% for (int j = 0; j<100;j++)
{
if (j%2==0) %>
<P>
The value of the even integer is <%= j %>
<% }%>
Internet Technologies
259
The code above is expressed in JSP. It just displays all the even integers between
0 and 100 on a Web page.
259
Internet Technologies
260
Apache is a robust scaleable Web server that is free. It was developed by the
Apache Foundation as an open source product. It is certainly the most popular
Web servers with something like 65% of the market.
It is associated with a number of other free products such as Tomcat the JSP
engine and Cocoon the Web publishing system.
260
261
The bullet points above contain the basic facilities of apache. They enable the
server to be configured, started, stopped and restarted. Apache contains a number
of graphical tools which enable the administrator to set up a basic configuration
without the need for developing textual scripts
261
262
Apache has facilities for easily integrating most dynamic content technologies
which are non-proprietary, for example Java Server pages (JSP).
It is easy to develop, maintain and store CGI scripts in Apache and configuring
Apache in such a way that scripts are executed when a certain event occurs, for
example an event associated with a security violation.
Apache allows the hosting of multiple sites using a technology known as virtual
hosting.
262
Performance improvement
Fault tolerance
Monitoring Apache
Security provision
Internet Technologies
263
The Web administrator can improve the performance of Apache via a large
number of options. For example the server uses caching to keep popular pages in
main memory. The administrator can decide what these pages are and how big
the cache is to be.
Apache also contains facilities for fault tolerance, for example via clustering.
There are also a large number of facilities for monitoring the use of an Apache
Web server. For example a log file is constructed which contains details of all
accesses to the server. The Web administrator can look at this file to determine
what pages to cache.
Apache also has facilities for security, for example authenticating users,
integrating the Secure Sockets Layer (SSL) and processing digital certificates.
263
264
The technologies that have been described so far have been technologies where
the programming has been at the server side, for example applets are stored at the
server and downloaded into the browser and executed and Java Server Pages are
stored at the server and executed there.
There are technologies which enable code to be directly embedded in a Web page
and executed by the browser.
The best known of these is JavaScript.
264
JavaScript
265
265
Internet Technologies
266
266
Variables
Control structures
Arrays
Functions
Some objects such as strings
Access to browser page
Internet Technologies
267
267
Object access
charAt(index)
concat(string)
substring(start,end)
toLowerCase()
Internet Technologies
268
The four methods above all are associated with JavaScript strings.
charAt
concat
substring
toLowerCase
268
Internet Technologies
269
JavaScript has access to the Dynamic HTML object model. This consists of a
number of classes which represent Web page elements such as applets and
scripts.
A large number of methods can then be used to interrogate information about an
object. For example the version number of a browser can be found out by sending
a message to a navigator object.
269
Examples
Internet Technologies
270
This example taken from a previous slide shows three accesses to DHTML
objects. The object window is a window into which data can be entered and
document is the current page being browsed.
270
Another example
if(navigator.appVersion.substring(1,0)==4){
...
Internet Technologies
271
Here the object navigator is accessed. This object contains information about
the browser being used, the method appName returns the name of the browser.
The method appVersion returns with a string which contains information about
the browser version and the operating system. The first character of the string
represents the browser version.
271
window
document
body
history
navigator
event
anchors
applets
forms
images
links
plugins
Internet Technologies
272
The bullet points above represent some of the objects in the Internet Explorer 5
object model. Most of them are self-explanatory apart from event and history.
The object history represents a history of all the sites visited by a browser;
event is used as an object when a user event occurs such as a button being
clicked.
The objects in the right-hand column represents collections of objects on a Web
page, for example anchors contains an enumeration of all the anchors in a Web
page. Each element in such a collection can then be accessed by means of a for
loop.
272
Browsers
Web server
Internet Technologies
Data storage
273
Here a three-tier architecture is displayed with the client layer being implemented
by a browser. The middle layer is implemented by a a Web server which contains
business objects. These objects may be implemented in a number of ways, for
example via remote object technology or via a technology such as servlets. The
final layer is that of data storage, normally this is implemented via some database
technology.
273
Internet Technologies
274
Web pages are developed using a special purpose markup language known as
HTML, the source of a Web page will consist of the text of the page interspersed
with HTML instructions which format the text, for example displaying it in bold.
274
HTML elements
Elements
Tags
Attributes
Internet Technologies
275
be
displayed
in
blue
and
in
the
</FONT>
Will display the text between the tags in blue using the Helvetica font.
275
276
Design so that consistency is achieved. This does not mean that Web
pages in the same site should look the same. What it means is that certain
elements should be implemented in the same way, for example the way
that links to other pages in the Web site are displayed should be consistent
across the site. Also pages which carry out the same function should look
the same, for example pages which contain forms should have the same
look and feel and pages which provide information about products that
are to be sold should also look the same.
Concentrate on user content. Do not use lots of space on a page for
navigation.
Make sure that different browsers display pages in similar ways. Even
today, different browsers will display a page in quite different ways. If
you expect users of the site to use a variety of browsers check out what
the pages look like and be prepared to design two or more different pages
which can be read by different browsers.
Be sparing with colour. One of the major errors made by the designers of
the early Web sites was to employ too much colour in a page, particularly
in the text where words were displayed in different colours. In deciding to
use a colour ask yourself why you want to use the colour. For example,
you may want to use red to highlight a very important point. A warning
though: overuse of a colour such as red can lead to readers ignoring the
text.
276
Internet Technologies
277
Do not design pages that take a long time to download. Although this
issue will decrease over the next five to ten years with broadband
connections becoming cheaper it is still a vital issue now. Research has
shown that users are just not prepared to wait for a long time for a page
containing animations and fancy graphics to download. Keep the bulk of a
page devoted to text. If you think that a user wants to look at a specific
graphic then display a small, memory-spare version of the graphic (known
as a thumbnail) with a link to the large version of the graphic.
Make full use of linking. If a page consist of three sections which are
notionally separate, for example a description of the teaching of a
university department, the research and a list of the staff in the
department, do not place all the text for these sections on the same page;
place each in a separate page and link to them from a summary page.
Make hyperlinks small. If you link large amounts of text in a Web page
the text will become very difficult to read. If the link is short, say no more
than four words, it might not fully describe what is being linked to. If so,
place some text close to the link (say, in brackets) which describe what
resources the link points at.
Use familiar conventions for links. The default within browsers for
displaying links is to underline them in blue. Most users are culturally
used to this; designing a Web site with non-standard ways of
implementing links, for example as orange links against a black
background will confuse users.
277
Internet Technologies
278
Use style sheets throughout your site. This means that they will have a
uniform appearance and when you decide to change some aspect of the
look and feel of your site the change will be reflected throughout the site.
Provide printed versions of pages. If you think that a certain page, or
collections of pages are going to be frequently printed then provide
downloadable versions of the pages in some word processing or document
display format. Browsers are still quite poor at displaying pages and often
crop them.
Keep the text on a screen short. Users feel uncomfortable reading text
from a screen. Use hyperlinks to reference other pages which might have
been physically included in the text.
Make your text readable. For example use colours which contrast highly
with the background of a page, use a plain background, use big enough
fonts and do not use moving effects such as flashing text: they make a
page quite unreadable.
278
Internet Technologies
279
Signal the use of multimedia. If you are going you use a multimedia
presentation with all its attendant download problems reference the page
containing the material from another page and warn the user what they
will be getting in terms of download and in terms of what the multimedia
does.
Design the home page in a different way to other pages, albeit using the
same style. The aim of a home page is to encourage users to enter the
Web site so it should succinctly describe what lies in the site and hence
should contain a map of the site.
Make sure the user knows where they are. The user should be aware of
where they are both in terms of the World Wide Web and in terms of the
site they are visiting, where they have been and where they could go.
Where the user is relative to the World Wide Web can be catered for by
having some graphic or noticeable text which tells the visitor what
company or organisation the site is associated with. In terms of where
they have been adopt the practice of displaying past page titles when the
user is traversing a set of linked pages, for example when registering for
an ISP. These titles, together with the titles of future pages, can be
displayed prominently at the top of the page with the current page being
highlighted in a different colour. In terms of where a user can go place
some navigation map in the page which shows the map of the site with the
current page highlighted in a prominent colour. If a user is traversing a
series of pages then don't forget to include a link backwards in the page as
well as the link forwards.
279
REST
Representational State Transfer
An alternative to HTTP
Does use HTTP and other protocols to
connect with a web server
Involves adorning the HTTP address with
information required for functionality.
An architectural style not a standard
Internet Technologies
280
280
An example
INPUT
http://www.parts-depot.com/parts
OUTPUT
<?xml version="1.0"?>
<p:Parts xmlns:p="http://www.parts-depot.com"
xmlns:xlink="http://www.w3.org/1999/xlink">
<Part id="00345" xlink:href="http://www.parts-depot.com/parts/00345"/>
<Part id="00346 xlink:href="http://www.parts-depot.com/parts/00346"/>
<Part id="00347" xlink:href="http://www.parts-depot.com/parts/00347"/>
<Part id="00348 xlink:href="http://www.parts-depot.com/parts/00348"/>
</p:Parts>
Internet Technologies
281
This shows a REST command being issued and a REST compliant server
returning the data required ( a list of parts that are stocked).
281
Web services
Internet Technologies
282
There is nothing complicated about Web services. They are just services which
can be accessed over the Internet. They are not associated with a particular
technology such as XML, although the SOAP technology detailed in this lecture
has a link with XML.
282
Web services
Built on top of Internet protocols
Most popular protocol becoming HTTP, although
other protocols can be used, for example POP3
The Web service acts as an abstraction layer
Ongoing effort to build Web services on top of the
Java 2 Enterprise Edition (J2EE)
Messaging is the underlying architecture
Internet Technologies
283
A Web service uses a standard Internet protocol, most applications which use a
protocol embed them in HTTP since the Web browser and the Web server
already have built-in functionality to handle HTTP code. However other
protocols (even mail protocols such as POP3 can be used)
The Web service acts as a buffer between the application code, for example
database code and the client (usually a browser).
The area of Web services is an actively growing one with vendors such as IBM
and Microsoft putting large amounts of resources into providing frameworks for
such services
Since messaging is the main architecture still used on the Internet it is the one
used for implementing Web services.
283
Internet Technologies
284
There are three components to a web service, all of them are standardised and
implemented using XML.
The Web Services Description Language (WSDL) is a simple language that
describes the service that a Web service provides. In effect it describes the
interactions between a client and a Web service.
SOAP is the protocol that is used to communicate between a client and a Web
service. It defines the format of the service request and the format of the response
by the service. If there are any errors which occur it will be used to send error
details.
UDDI is the part of the web service architecture which is concerned with the
defining the various registries that are used to contain documents expressed in
WSDL.
284
Service
registry
Service
provider
Find
WSDL, UDDI
Service
requestor
Internet Technologies
WSDL, SOAP
Bind
285
The reprise of the previous previous slide shows where each of the component
parts of a Webs service fits within the generic architecture.
285
Company roles
Service requestor
Service provider
Registry (White pages)
Broker (Yellow pages)
Aggregator (Green pages)
Internet Technologies
286
There are a number of roles that a company can adopt with respect to Web
services.
A service requestor is a company that makes use of some web service.
A service provider is some company that provides the Web service.
A registry is a company that collects data on what services are provided, usually
in a specific area, for example in commodity broking.
A broker is a registry that offers intelligent search services.
An aggregator is a company that is a broker but also has the ability to describe
policy, business processes and binding descriptions
286
Transactional model
Membership or subscription model
The lease of licence model
The business partnership model
Registration model
Internet Technologies
287
There are a number of revenue raising models that can be used for web services.
The transactional model is the most familiar where the charge for a service is
made based on the number and type of transactions initiated by the client.
The membership (subscription) model is based on paying a fee for access to a
Web service.
The lease (licence) is similar to the membership model but is normally adopted
between two large companies whereby the membership is customised rather than
shrink wrapped.
The business partnership model is based on activities such as bartering of
services, equity, or even a percentage of gross revenue of the requestor.
The registration model is normally applied to the green pages type of access
where the publication of a service by a company implementing such a model is
based on using employing the company as a sort of shop window.
287
Promotes interoperability
Enables just-in-time integration
Reduces complexity by encapsulation
Enables interoperability of legacy
applications
Internet Technologies
288
Web services interact via XML protocols, this means that all the agents that
interact in a Web service environment are able to operate independent of things
like platform and operating system.
It enables an application to interact with services at run time. For example, a
client might want one type of service from one Web service provider get it and
then during the running of an application obtain a slightly different service from
another provider.
Encapsulation enables the details of a service to be hidden from the client. All
that the client needs to know is how to call on the service.
A legacy application such as an accounting package can be front-ended by a Web
service and the facilities offered by the package offered to clients. It is irrelevant
to the clients that the package might be some legacy software. This has to do with
the encapsulation advantage in that a Web services architecture hides the
underlying details of an implementation.
288
Application
code
Depends
on
implementation
Implementation
independent
Internet Technologies
289
Here we see the idea of the Web service as an abstraction. On the left-hand side
we have code which is specific to a particular platform and programming
language, note that this code may span a number of computers in a network
where, for example, the network may use a proprietary protocol.
On the right-hand side we see clients communicating with the implementationspecific functionality using standard non-platform specific technologies and
protocols such as browsers, HTTP or even an email protocol such as POP3.
The service isolates the client from the dirty machine specific parts of the
application.
289
Internet Technologies
290
The ideal Web service has a five layered architecture which allows discovery
functionality: finding out about a service; description functionality, understanding
what a service offers; packaging functionality, sending data around a network in
a way that an application can understand it; transport functionality, using
standard protocols to send data; and network functionality which carries out the
raw transport of data using low level protocols such as TCP/IP.
290
Internet Technologies
291
There are a number of projects and ongoing technologies which are used for
implementing Web services. The lower down the hierarchy the more established
the technology.
The UDDI project and the WS-Inspection project are both concerned with
providing a sort of sophisticated directory system for clients.
The WSDL and RDF projects are concerned with providing enough information
to a client so that they can use a service, for example what packaging protocol is
supported by the client.
The main packaging technology SOAP is concerned with providing a language
into which service payloads are embedded.
Transport is provided by standard Internet protocols.
Basic network functionality is provided by the low level Internet protocols.
291
SOAP
Major packaging technology
Defined in XML
Contains message format, value encoding
and exception-reporting mechanisms
Usually uses HTTP as a transport carrier
Internet Technologies
292
SOAP (Simple Object Access Protocol) has rapidly become a major player in
Web services. It has been defined in XML and contains a number of facilities
such as that for defining the format of a message, specifying the values of
parameters and providing information about exception processing. It normally
uses HTTP as a mechanism for communication.
292
SOAP messages
Routing,
security
and delivery
info
Payload
information
such as
transaction
fields
SOAP
Header
SOAP
body
Internet Technologies
293
A SOAP message sent say from a client running a browser to a server consists of
two items: a header which specifies administrative information such as security
settings or routing data and the message itself.
293
Internet Technologies
294
This shows the header of a SOAP transaction, it uses the definition of SOAP
found on the World Wide Web Consortium Web site. The header just gives a
transaction number and states that the recipient must understand SOAP
transactions. Note that this is expressed in an XML-based language
294
Must understand
<m:transaction xmlns:m = soap-transaction
s:mustUnderstand = true>
Internet Technologies
295
A recipient may or may not understand a header (it must understand the message)
The mustunderstand property specifies which part of a transaction the recipient
must understand; in the example above the recipient must be capable of
understanding SOAP transactions.
295
296
This is the payload of the SOAP message, it contains data which is used when a
client wants to purchase a book from an online book store such as Amazon. Note
the use of XML conventions.
296
SOAP faults
Special type of message which describes errors
Fault code
Fault string
Fault actor
Fault details
Internet Technologies
297
When an error occurs in a SOAP transaction, for example the recipient does not
understand the message, a SOAP fault is generated. This contains some code
which identifies the type of error, a fault code which is a readable version of the
editor, the actor which is the identification of where the error occurred and an
application specific description of where the error occurred.
297
Internet Technologies
298
298
Internet Technologies
299
This shows how a SOAP transaction can be embedded within a POST HTTP
request.
299
</s:Envelope>
Internet Technologies
300
The response to the request detailed in the previous slide is shown below, it does
not require any changes to the HTTP specification.
300
WSDL
Language used to define the services that
are offered by some enterprise
Expressed in XML (see later)
Requires a lot of detailed XML coding
Defines each service that is offered
Internet Technologies
301
301
Internet Technologies
302
This forms part of a very large Web site which is concerned with snowboarding. I
forms part of the definition of a service in which clients can ask questions such as
Which snowboarders endorse snowboard xxxx?
302
UDDI
Universal Description Discovery and Integration
Industry initiative first started by IBM Microsoft,
NTT and SAP
Relies on standard technologies such as XML
A framework for describing services, discovering
businesses, and integrating business services using
the Web.
It has been developed as a platform-independent
and open framework
Internet Technologies
303
UDI is the remaining part of the trilogy behind Web services (SOAP and WSDL
are the others)
It relies on standard Internet technologies and is used to produce registries which
contain descriptions of services offered by companies on the Web.
It is an open system which does not depend on a hardware or software platform.
303
Sockets
CORBA or RMI
Servlets
RPC (XML-RPC)
Just using standard HTTP
Internet Technologies
304
It is important to point out that there are a variety of technologies that can be
used for Web services ranging from the primitive (sockets) to the complicated
and sophisticated (CORBA) the main principle is that the service acts as a buffer
between implementation-specific code and implementation-neutral code (Internet
code). That the programmer at the client side be offered implementationindependent hooks into the application
304
Summary (i)
A Web service can be implemented in any
way.
A variety of technologies can be used
including sockets and servlets
Web services are predominantly message
based
Web services are based on HTTP
Internet Technologies
305
305
Summary (ii)
Increasingly the main technology being
employed for Web services is SOAP
SOAP is a messaging technology
It is defined by XML
Much more work needed on SOAP, for
example tool sets.
Internet Technologies
306
306
Lecture 7
Web 2.0
Internet Technologies
307
307
Aims
To place Web 2.0 in the context of Web1.0
and the semantic web
To look at the concept of the writable web.
To look at a number of technologies
associated with Web 2.0
To look at the iiea of the Internet as a
computer
Internet Technologies
308
309
We have seen the rise of the World Wide Web since the early nineties. We have
seen lots of talk and research about Web 3.0 or the semantic web (dealt with in
the next lecture), so what is Web 2.0. It was a term coined by the publisher Tim
OReilly to describe the increase in two way traffic that is occurring now. He also
used it to describe the increase in community-type use of the web.
309
Web 1.0
Web 1.0
Web 2.0
The
World
The world
Web 2.0
Internet Technologies
310
This is the major change that has happened to the World Wide web. From being a
mainly one-way communication medium it has been transformed into a 2-way
medium where the client can write to the server
310
Internet Technologies
311
311
Three examples
Double Click vs AdSense. Former feeds large web sites,
the latter addresses the long tail, it deals with cost per click
processing.
Flickr. Photo sharing and display site. Allows users to
collect photos as sets and also allows users to tag photos (
a folksonomy).
Content management systems vs Wikis. The former are
highly structured ways of organising web content, for
example Cocoon. The latter are free form ways of enabling
users to develop large collections of text.
Internet Technologies
312
The three examples above are examples of the way that Web 2.0 has effected
commercial applications. All employ collective intelligence and large amounts of
user participation. All are capable of being mashed via APIs.
312
Internet Technologies
313
A key work for Web 2.0 companies. Anderson explores the fact that the area
under a long tail distribution is greater than the front of the distribution: that, for
example sales of the last 100,000 books on Amazon will be greater than for the
top 100.
313
XML
Ajax
Wikis
Blogs
Frameworks
Platform APIs
Yahoo Pipes
Tagging
Genetic algorithms
Internet Technologies
314
These are a number of technologies that have enabled this growth, this lecture
will describe many of them
314
Desktop
Grid computing
RSS
Internet
Community
sites
315
315
Grid computing
316
This topic will be dealt with in more detail later. It started with users donating the
spare cycles of their processor to some processing that was required by an
organisation such as SETI. Usually these sites carried out massive computations
which could not be attempted by supercomputers: molecular genetics, drug trial
simulations and massive engineering design
316
Offline storage
The use of large file servers to store files
created by users of the Internet
Best example is that of S3, the Amazon file
storage facility
Large numbers of advantages: security,
backups facility, etc.
The Internet as a file storage facility
Internet Technologies
317
Now that we have high speed access to the Internet it is becoming feasible to
store data offline in secure facilities, a major growth area has been in companies
offering this service. Amazon is the current leader with its S3 facility.
317
APIs
The availability of APIs that enable a company to
program access to some underlying data
Usually read access is only allowed
Sometimes read and write access is allowed
Examples: Google maps, Amazon, eBay and
PayPal
318
There are now a number of companies that offer access to their underlying data
via APIs in languages such as Java, Perl and Python. Most of these just allow
query facilities, however some such as Flickr offer write access.
318
Platform APIs
Such APIs need a good business model
Amazons model allows associate
companies to make money at the same time
that Amazon gains sales
The now defunct Google search API did
not.
Internet Technologies
319
Platform APIs only work when there is some leverage between all the enterprises
that are involved in their use. The Amazon API is a good example of this as it
allows companies to register as Amazon associates, download book details, post
them on their site. Visitors to the site can then buy the books from the associate.
Amazon would then do the standard selling functions gain profit and share a little
with the associate.
The Google API failed in that it allowed other companies to feature search on
their site without the display of the online ads that make Google so profitable. It
was ditched.
319
Desktop interfaces
Mainly implemented by means of a collective technology
known as a framework
Ajax is currently the only game in town
Based on a combination of Javascript, XML, CSS and
HTML
Attempts to replicate the desktop as an interface to the
Web.
320
320
Collaboration-based development
Examples include Apache and Wikipedia.
Collaborative facilities such as Wikis enable
this to happen easily
Platform APIs also enable collaboration
based on integrating (mashing) sites
The Internet as a collaborative development
medium
Internet Technologies
321
Now that technologies such as those associated with broadband have enabled two
way communication we now have lots of examples of this happening. There are
the standard examples of Apache and Wikipedia where hundreds, if not
thousands of users have collaborated to produce the most popular web server and
the most popular online encyclopaedia. There are also examples of individual
users mashing together sites in order to create functionality that represents
soemthing greater than the sum of their parts
321
Blogs
WebBlogs
Personal diaries
Now being used as corporate PR
A number of good blogging sites and
systems to support blogging.
Internet Technologies
322
One area which has a slight connection with Web 2.0 is the blog. A blog is
effectively a we site that holds text written by one person; the first blogs were
diaries, usually from some pundit or technical guru. They have expanded out over
the last two years to include ordinary users of the Internet including ambulance
drivers, policemen and call girls.
The commercial aspect to blogs is that some companies encourage their staff to
blog in order to give a positive view of the company that would not occur with a
PR site. When the PR department of a company starts blogging derision usually
follows.
322
Internet Technologies
323
The next three slides look at three APIs that enable programmers to mash
applications together.
323
Internet Technologies
324
One of the most successful APIs has been developed by Amazon. It provides a
variety of calls in a number of programming languages such as Java and Perl to
the huge database of products that are stocked by that company. This API
supports an archetypal Web 2.0 model: that of an associate relegating the tough
stuff to Amazon (stock control, customer billing) and achieving profit via sales.
324
Google APIs
Number of varieties
Google maps
Google search
Google OpenSocial
Google Reader aggregator
Internet Technologies
325
There are a large number of Google APIs that interface to a variety of Google
resources. Many of them are associated with JavaScript with some having java
and Perl interfaces. These APIs are the most used on the Internet.
325
PayPal API
SOAP and REST interfaces
Variety of programming languages can be used.
Smallish number of mashups so far, main mashups
are with the eBay API for process handling.
Typical functions are: process a credit card
payment, authorize funds for an order
authorization, authorize funds for an order
authorization, issue a refund for a PayPal
transaction
,Internet Technologies
326
PayPal have a typical API which like the Amazon API is based on REST and
web services.
326
RSS
The glue that binds together the Internet
Based on XML
Accessed by a number of mashing
technologies such as Yahoo Pipes.
Initially a news feed technology
Now used for all sorts of data feeds
Internet Technologies
327
327
Mashing
RSS feed
RSS feed
Mash up
RSS feed
RSS feed
Internet Technologies
328
RSS is used as the basis for mash ups. These combine feeds from a variety of
sources and provides some cross linked functionality, for example a map showing
high crime levels in a city or a map displaying Flickr photographs.
328
An RSS example
<item>
<title>Earth Invaded</title>
<link>
http://news.example.com/2004/12/17/invasion
</link>
<description>
The earth was attacked by an invasion fleet
from halfway across the galaxy; luckily, a fatal
miscalculation of scale resulted in the entire armada
being eaten by a small dog.
</description>
</item>
Internet Technologies
329
Here I show a very simple feed, in gneral the standards for RSS, for example
Atom allow a lot more content.
329
Frameworks
New way of developing systems.
Starts with a skeletal architecture and then
developers produce a number of
instantiations of concrete examples.
Until recently few frameworks were in
existence as public or commercial entities.
Ruby on Rails the first example of a highly
popular framework technology.
Internet Technologies
330
Frameworks are the software equivalent to the girders and ties that hold a
building together. They are a high level design of a system which can be moulded
towards a specific set of requirements from a customer. For example a
framework for accounting systems can be transformed into concrete versions of
the type of accounting systems that major companies use to drive their business.
The first real framework has been Ruby on Rails.
330
Design patterns
Revolutionising the process of systems
development
Use architectures.
Implementable in any OO language.
Basis of frameworks
Mainly aimed at maintenance
Internet Technologies
331
Design patters are small sections of pre-fabricated code with hooks to insert
application specific code. It was developed by four industrial researchers known
as the gang of four. It relies on object-oriented programming languages, the
original book on the idea being configured to C++. Design patterns enable
software maintenance to be a much easier process since many of the patterns
documented are aimed at change.
331
Internet Technologies
332
332
Ruby on Rails
Concrete system
Templates
ObjectDatabase mappings
MVC architecture
Metaprogramming
Internet Technologies
333
Here we show the main components of Ruby on Rails. The model view
architecture allowse vents to be trigged, meta-programming (the ability for a
programming language to find out things about its programs), mappings from
objects to databases and common templates used in we development produce a
remarkable set of efficiencies.
333
Metaprogramming
Tell me how many variables
does this subroutine have?
Program
Internet Technologies
334
334
Ruby on Rails
335
Ruby is the only technology that addresses all the properties that one expects of a
framework. It can be capable of very rapid development since 90% of web-based
functionality is embedded in its architecture; when a developer wants to produce
a specific web-based application all they need do is to write relatively small
amounts of Ruby code. Sophisticated web applications can now take only a
matter of days.
335
336
These are some of the important implications that Web 2.0 has for developers.
The main one is in terms of integration. Many of the technologies documented
her, for example RSS, enable the development of systems in terms of bringing
together internet systems and resources. The last four lectures concentrate on this
increasingly important topic. The remainder of this lecture looks at data
integration.
336
337
There is a large amount of data on the Internet and there is considerable scope for
mashing it together. The remaining slides look at some particular mashups
detailed by Segaran.
337
Internet Technologies
338
If you want to see what the capabilities of the Internet are in terms of data
mashing buy this book.
338
Techniques
Bayesian analysis
Genetic algorithms
Particle swarm optimisation
Ant colony algorithms
Cluster analysis
Neural nets
Internet Technologies
339
339
A Sobering Read
Internet Technologies
340
340
Lecture 8
The semantic web
Internet Technologies
341
341
Internet Technologies
342
These are all examples of current applications which use advanced search and
optimisation algorithms
342
Aims
To describe the disadvantages of the current
web
To describe the use of semantic information
within the web
To outline the main architectural features of
the semantic web
To outline some current research projects
and case studies
Internet Technologies
343
344
The World Wide Web has been massively successful. However, it has its
problems. The first is that it was meant for a single program the browser. We
have plenty of tools which enable us to find information from a Web page but
their development is quite painful.
Web pages provide no indication what a chunk of information means. The
semantic net project is a very ambitious project which is attempting to overcome
this
344
Semantics
Web
page
200
Internet Technologies
345
The box above shows a Web page. Inside that page there is a number what does it
mean? It is easy for a screen scraper to extract this number but it is much more
difficult to understand what the data stands for. Is it a serial number, a price, a
house number the position of a CD in the charts?
345
<TITLE>Employee</TITLE>
Web
page
We have more
clues about the
data now
200
Internet Technologies
346
We might know that the page represents an employee in a company from the fact
that there might be subsidiary information held on the page, for example the
preamble to the page might contain a string Employee However, we are not that
much better off. The 200 might be: the room number of the employee, their
weekly salary in hundreds of pounds, the number of staff they are responsible for
or some internal identity number.
346
<TITLE>Employee</TITLE>
Web
page
We now have a
pretty good idea
room=200
Internet Technologies
347
We now have an excellent idea about the string 200 Since it is closely
associated with the string room we are able to say with a lot of confidence that
it some room number. But is it the room number that the employee can be found
or the current room number that they are in (the company might be using an
active badge system).
347
348
348
The solution
The solution is to associate a standard semantics
into a Web page. For example a standard semantics
for book publishers, a semantics for book sellers or
A semantics for open source projects.
Internet Technologies
349
Because of the problems which I have alluded to with screen scrapers the only
solution to the semantic poverty problem is to have a standard semantics for
types of application. Such semantics being embedded within Web pages.The
collection of Web pages which have this semantic information attached is
collectively known as the semantic web.
349
Internet Technologies
350
The four bullet points above describe the main components of the Semantic web.
They will be described in detail in subsequent pages.
350
RDF
Triple based
Defined using XML
Long and short versions available
Based on subject/predicate/object
relationship
Internet Technologies
351
RDF is the core of the semantic Web. It is used to hold base information about
entities in some world, for example the world of book selling. It is defined using
XML and there are a number of long and short versions of RDF definitions
available. The long version takes a lot of learning
351
Internet Technologies
352
Here there are three triples. The subject is the thing that is described (Jones,
Phillips, Milton Keynes). The predicate is the aspect of the resource that is being
described (earns, is situated, has a postcode). The object is the value of the
subject governed by the trait (20000, Eindhoven, MK)
352
Internet Technologies
353
Here the text describes the fact that the title of a page is Tony Benn and the
publisher of the page is WikiPedia
353
Internet Technologies
354
354
Internet Technologies
355
Here are some examples of RDF vocabularies. FOF is used to define the
relationships between people and models social networks. It models predicates
such as the fact that one person knows another person.
RSS is used to describe web sites which offer syndication services.
KlogMS is used for embedding knowledge about Web logs.
Dublin Core is used to describe the semantics associated with information objects
such as books, research papers, Web pages etc.
355
A semantic net
Jones
Is in dept
Accounts
Is a dept in
Works in
Acme Company
Owns
Building 2
Internet Technologies
356
The side above shows an example of a semantic net which might be embedded
within a series of Web pages maintained by a company. There is no reason whey
further items which are external to the company cannot be added, for example a
town planning web site might reference Building 2 and involve relations such as
Is situated in.
356
357
The three queries above are examples of natural language queries which might be
processed by traversing the Semantic Web. I is rather fanciful to think that such
queries would be processed as natural language, they would have to be
transformed into queries in some structured language such as |SQL. The
important point to make is that such queries would be enabled by the semantic
web and would enable the instigator of these queries to search the whole of the
Web.
357
Simple queries such as those in the previous slide are technically easy
to program.
More complicated queries require something called an inference
engine which uses axioms.
Inference and logic have been quite busy areas of computing research
over the last twenty years, mainly in artificial intelligence.
The results from AI inference work have been somewhat disappointing
even when data is held in the same memory store.
This means that the prospects for inference for the semantic web are
still a long way off.
Internet Technologies
358
The previous slide detailed some queries which could be easily translated into
SQL and hence can be relatively easy to program. However, more complicated
queries in predicate calculus, queries which require deep levels of quantification
(for all, exists) are still a long way off. RDF still does not support such queries.
There is, however, a heavy chunk of knowledge in computing about inference,
mainly generated by researchers in the artificial intelligence area. Unfortunately
progress in this area has been mighty slow with even queries on data stored on
the same computer taking a huge amount of time. With such data spread around a
WAN the implication here is that logic, inference and inference engines will not
be seen for some time on the Web.
358
359
In the future Semantic Web there will be a number of vocabularies which cover a
number of application areas. For example, there may be a vocabulary which is
concerned with applications for book sellers, another for libraries and another
which is used for cataloguing research achievements. In each of these books will
be defined but my have different names, for example bookforsale, publishedbook
and book. OWL enables linking information to be set up which says that these
three entities are equivalent.In general Owl is used to define the relationships
between vocabularies.
359
RDF Schema
Similar to database schema
Defined using XML
Allows a designer to define and publish the
vocabulary used by an RDF data model
Example: it might define that people have a
phone attribute.
Also defines classes and subclasses
Internet Technologies
360
360
Research 1( REWERSE)
361
One of the main research areas that is being addressed by Western governments
is that of inference. This project, which is a network of excellence project, is
attempting to develop some small languages which enable the communication of
logical queries to the Semantic Web.
361
Research 2 (NRC)
National Research Council Canada Semantic Web lab
Rule-Applying Collaborative Filtering system
Combines multidimensional collaborative ratings with
RuleML-based rules to recommend Web objects such as
research papers
RuleML is a mark-up language for business rules
First example is RACOFI a system for recommending CD
albums for music
Internet Technologies
362
This is research which takes semantic web pages and uses collaborative agents to
discover objects which best match certain criteria. It uses a mark-up language
called RuleML.
362
363
363
364
364
365
The second case study involves a home-made, semantically augmented web site
for the cultural magazine Harpers. The web developers who produced the web
site augmented the HTML with XML markup that enables a high degree of
linking between entities such as features, feature elements and events to take
place. It might be worth your while to click on
http://www.harpers.org/HarpersIndex2003-10.html and then navigate between
entries to see the power of this technology.
365
Availability of content
Ontology
Multilinguality
Scalability
Visualisation
Stability of Semantic Web languages
366
There are a number of areas that need to be addressed if the Semantic Web is to
advance.
First, little RDF-based content is available. There needs to be a massive effort to
create new content and convert existing content.
There are few ontologies. A big effort is needed to create new ontologies and also
to create an organisational infrastructure which allows the standardisation and
change management of ontologies.
If the Semantic Web takes off it will be huge, there is a need to ensure that
activities such as searching and inference are carried out efficiently. This not only
requires research on new structures but also new algorithms.
There is a need for research which ensures that content in a number of different
languages can exist within the Semantic Web and activities such as search can be
carried out on such heterogeneous material.
There is a need for research into visualisation of parts of the Semantic Web.
There will be so much content available that some techniques will need to be
deployed to ensure that users are aware of content and that administrators are
able to see the effect of change on content.
The Semantic Web will not be successful if its facilities, including languages are
not stable. There is a need for a major standardisation effort.
366
Lecture 9
Grid Computing
Internet Technologies
367
367
Aims
Show how grid computing evolved from mass
computing and peer-to-peer computing
Describe the main grid computing concepts
Examine the main drivers behind grid computing
Look at the technology and business benefits of
grid computing
Briefly look at some grid computing applications
Internet Technologies
368
In this lecture I shall be looking at almost certainly the most important type of
server: the Web server. In it I shall be looking at the basic processing cycle used
by a Web server and examining the HTTP protocol that is used by a client and a
Web server to communicate.
Early Web technologies just dispensed static pages; soon this was regarded as
very limiting and so a number of dynamic page technologies were developed; in
the lecture I shall look at some of these.
Apache is almost certainly one of the most popular Web servers; I shall be briefly
looking at it and using it as a case study.
Finally I shall look at the role of the Web server in distributed architectures.
368
Grid Computing
Initially, the use of large numbers of computers
connected by network technology to carry out
computationally demanding tasks.
Has its roots in peer to peer computing and mass
computing.
Normally employs Internet technology for
connections.
Still in its early days.
Internet Technologies
369
This lecture looks at the topic of grid computing. This is the term given to
connecting large numbers of computers (thousands) together in order to share out
the processing of some computationally demanding application, for example
analysing biological databases. Each processor carries out some part of the
computation required and other computers gather together the results.
Computers would be connected using Internet technology. Grid computing is in
its infancy, however, it has its roots in slightly more mature technologies and
ideas, mainly in peer to peer computing and mass computing.
369
Client
Client
Server
Internet Technologies
370
The main paradigm that has driven distributed computing during the eighties and
nineties is that of the client server model. This is a hierarchic model where clients
are subservient to servers. It is totally pervasive, for example the World Wide
Web relies on clients (browsers) to access servers (Web servers).
370
371
The client server model is hierarchical in that a client asks for a service and the
server delivers it. Typical servers include application servers, database servers
and Web servers.
In general a client does not provide any services for other entities in an
application. Sometimes a small amount of functionality is found, for example
where a client is associated with a cookie and the server or other clients can
interrogate the cookie.
This model has been extant since the eighties.
371
Mass computing
The use of a number of computers to carry out
some tough computational task
SETI project is an example where radio waves are
analysed to check for patterns which indicate extra
terrestrial life
Many mass computing projects in existence
The connection with grid computing is the sharing
of hardware
The SETI project
Internet Technologies
372
372
373
Peer to peer computing eliminates the hierarchic relationship between client and
server. Each entity in the network has the same functionality and contribute
equally to the tasks that the network carries out.
Often peer to peer networks involve the transfer of large amounts of data, for
example sound files, and much of the current interest in P to P arises from
applications such as Napster.
There has been a large amount of interest in peer to peer computing in the last
five years, however, the history of P to P starts in the eighties.
373
Growth of P to P
374
Peer to peer computing has its roots in the USENET and FidoNet systems of the
eighties. These were the forerunners of newsgroups systems of today. USENET
was a true newsgroup system while FidoNet was associated with Bulletin boards.
Each were used to exchange files between participants, often using dial-up lines
overnight.
The conditions for successful P to P are that you have a public networking
infrastructure and fast interconnection. A true P to P system will have a number
of peers each dedicated to specific role such as sending files. This is the hallmark
of P to P. A system which does not satisfy all these condition is often known as
peer oriented. For example mass computing is peer oriented.
374
375
Probably the best known peer to peer applications are Napster and Gnutella. The
former was the first music file-sharing program on the Internet. It allowed users
to access other users computers via a server and download MP3 files. It gained a
huge amount of notoriety because it was used as a repository of pirated music.
Gnutella is a recent technology. It is interesting because it employs a true P to P
approach in that no servers are involved in the distribution files: it happens
directly between Gnutella nodes.
375
Grid computing
The use of large numbers of computers which look like
one computer.
Up to a couple of years ago was a researchers plaything.
Now there are a number of tools available, many from
IBM.
Number of protocols, most promising being the Open Grid
Services Architecture (OGSA)
Introduction to grid computing
Internet Technologies
376
A grid is a collection of computers which act like a single computer and appears
to both its users and developers as a single computer.The last two years has seen
a number of tools and protocols emerge which make grid computing a
commercial possibility. Most of the standards associated with grid computing are
open standards including the most popular OGSA.
376
Similarities
Like the Web, complexity hidden and all
users have the same interface
Like P to P, allows files to be shared.
Like clusters, brings distributed resources
together
Like virtualisation technologies, creates a
virtual resource.
Internet Technologies
377
There are many similarities between grid computing and other technologies.
It is similar to the Web because all the users have the same interface to it in the
same way that browsers have the same interface to Web pages.
It is similar to P to P in that files can be shared between entities in a grid.
It is like clusters in that computers can be brought together using existing
communication technologies to create a single resource.
It is like virtualisation technologies such as replicating databases in that you can
create virtual interface to a large amount of complicated resources.
377
Slack resources
Internet Technologies
378
The slide above provides the commercial rationale behind grid computing: that
most of a computing system is wasted.
378
379
There is another dimension to grid computing. The initial driver that has
sponsored most of the research into grid computation did not come from the fact
that resource was slack but came from a demand by applied scientists for a way
of increasing the computational speed of the hardware they used. Grid computing
enables large numbers of computers to share computational tasks.
379
Internet Technologies
380
380
Technology benefits
Better control of workloads
Increases capacity for high demand
applications
Support for multi-disciplinary collaboration
Workload balancing easier
Recovery easier using replication
Internet Technologies
381
The above are the main technological benefits of using a grid approach.
381
Business benefits
Improves collaboration.
Can solve previously unsolvable problems
Enables virtual departments and virtual
organisations to be easily created
Respond to fluctuations in customer needs
Provides optimal use of resources
Enables faster integration
Eliminates over-provisioning
Internet Technologies
382
The above are the main business benefits of using a grid approach.
382
383
These are a few of the current applications of grid computing. As you will see
most of them involve solving problems with high computational demands, the
sort of problems that first motivated grid research.
383
384
Standards work is in its infancy. However, many of the lower levels of any grid
standard will be defined by existing Internet standards such as IP and TCP.
The most promising standard is OGSA (Open Grid Services Architecture). This
has been developed by an organisation known as the Global Grid Forum.
384
OGSA
Infrastructure services
Resource management services
Data services
Context services
Information services
Self-management services
Security services
Execution management services
Internet Technologies
385
Shown above are the eight levels of service that make up OGSA, the slides
following describe them in more detail.
385
Infrastructure services
Communication between disparate
resources
Resources include processors, memory
database storage
Resolves problems associated with shared
access
Close to Internet protocols IP and TCP
Internet Technologies
386
386
387
The aim of the resource management service is to ensure that a level of service
characterised by performance and availability is maintained throughout the
functioning of the grid. This means that some resources will be devoted to
monitoring critical metrics such as system throughput in order to reconfigure
both the hardware and software elements of the grid.
387
Data services
Concerned with the movement and storage
of data
Provides a data replication service
Provides a P to P service
Provides a format conversion service
Manages the dynamic updating of both
permanent and transient data
Internet Technologies
388
Data services are concerned with managing the stored data that is held in a grid
system. It provides services whereby data is moved from one entity to another,
where a replication service is maintained and where data having different formats
is converted. The data service is not only concerned with permanent data but also
with transient data.
388
Context services
Concerned with keeping customer details
Each customer should be associated with
resource and usage policies
Used by the resource management service
Combined with service requirements to
drive the resource management service
Internet Technologies
389
This service keeps track of customer data: what resources they are allowed, what
sort of access they are allowed and what use is made of a resource. This service is
intimately connected with the resource management service since it requires such
information of any optimisations that are required.
389
Information services
Provides information about entities on the
grid
Example: the current availability of a
resource
Required by the resource management
service in order for it to function
Internet Technologies
390
The information service part of OGSA is concerned with efficiently keeping upto-date information about the resources in a grid. For example it may keep details
of where replicated data can be found and the degree of replication and when
such data was recently updated. This service is sued by the resource management
service part of OGSA in order for it to dynamically configure the grid in order to
meet QoS targets.
390
Internet Technologies
391
391
Security services
Aim is to enforce security policies
Covers all the standard security worries
including authentication, repudiation and
authorisation
Based on standard security infrastructures
such as public key systems
Well defined
Internet Technologies
392
Security services are just the same type of services you associate with a
distributed system, for example ensuring that users can only access resources that
they are allowed to access, that data being transmitted between nodes on the grid
is secure from reading and that users are authorised.
Much of the technology for this is in existence and its implementation is
straightforward.
392
393
393
Globus toolkit
De facto toolkit for developing grids
Based on OGSA
Implements about 40% of the OGSA
specification
Good Java interface
The Globus toolkit
Internet Technologies
394
394
Basic Globus
Internet Technologies
395
395
Lecture 10
XML(i)
Internet Technologies
396
396
Aims
Detail the problems that XML is intended to
solve
Detail the main elements of XML
Describe some XML-based languages
Describe the concept of a DTD
Introduction to XML
Internet Technologies
397
The next two lectures will look extensively at XML. There are a number of
problems associated with distribution which XML is intended to solve. I shall
initially look at these.
I shall also look at some of the main elements of XML and how XML schemas
are developed and processed.
There is a host of software tools that are available for XML, for example tools for
publishing documents in a number of formats. I shall describe a small selection of
these.
I will conclude the lecture by looking at some examples of XML-based languages
397
Process
Word
PDA format
Internet Technologies
398
398
XML is used
Where a distributed program has to interact
with a number of different formats of data
When processing has to be shifted onto a
client
When different users have to have different
views of the same data
Applications which use mobile objects
Internet Technologies
399
399
400
These are applications where there may be a differing set of formats. The slide
details one: that of document processing; however, there are many more in the
area of Electronic Data Interchange where companies basically hold the same
data but with different formats and want to interchange this data effortlessly.
XML can be used to define the basic structure of the data and XML tools are able
to process it in a uniform way.
400
401
A number of applications require the client to carry out quite a bit of processing:
the example of HTML being displayed by a browser is a good one. Unfortunately
this data is often different. XML can be used to develop software which
interposes itself between the client and the server and which converts it to some
standard format. This means that only one version of the client software would be
used; this overcomes much of the updating and version control problems
associated with distribution.
401
402
This is one of the most popular uses of XML. Many information system
applications require the same data to be displayed in a variety of ways. For
example sales data may be viewed by accountants, sales managers and individual
sales people. Each of these has different requirements, for example the sales
manager might only want the summary element of the data while a salesperson
might want detailed sales from individual companies.
XML allows a developer to define the format of the data and then easily write
special-purpose programs which can then provide the view that is required.
402
Internet Technologies
403
Mobile agents are objects which visit sites in order to gather information. The
best example of such agents are those which wander the Web looking for low
price items. Programming such agents is tedious because different Web sites use
HTML in different ways to display their catalogues. For example it is very
difficult for an agent to discern whether a number on a page is a price, a quantity
or a date.
By developing similar sites using the same base XML source it is possible to put
some structure on a site so that certain information important to the mobile agent
can be easily discovered.
403
What is XML?
XML is a language for defining languages.
Such languages are known as meta-languages
and have been used in computing for over 40
years. Recently they have fallen into disuse;
however XML has revived the concept.
Internet Technologies
404
404
US Davis
Meta-language
Language
Internet Technologies
405
405
XML processing
XML definition
of a language
Language
source
Processing
outputs
Processor
Internet Technologies
406
XML is used to define a language. All the languages defined by XML are based
on the sort of tags that you can find in HTML. A processor will then take source
written in the language defined by the XML and process it in some way. The
outputs from this processing can be:
Error reports
Versions of the documents expressed in certain ways
Protocol commands
Electronic documents
Paper documents
Web documents
406
History of XML
Based on ideas which were in HTML but
which emanated from SGML
Overseen by the World Wide Web
Consortium
Addressed a number of problems that were
emerging in the Web during the nineties
The importance of XML
Internet Technologies
407
Many of the concepts in XML for example the fact that it should be tag-based
can be found in HTML which owes very much to the document markup metalanguage SGML.
Development of XML and all its associated standards is now overseen by the
World Wide Web Consortium a group of universities, companies and
government organisations which look after standards for the Internet.
The XML standard is pretty much static now and should change very little over
the medium future.
407
Internet Technologies
408
408
Design aims
Easy to use
Support a large number of applications
Compatible with SGML
Easy to write programs
Can be prepared easily
Optional facilities low
Easy to understand
Internet Technologies
409
409
Document in
spreadsheet format
Document as
relational db
Document in
XML-based
language
Document in
HTML
Document directly
viewed by browser
Document indirectly
viewed by browser
Internet Technologies
410
The slide shows how a document defined by XML can be stored and viewed in a
number of different ways. Processor which take an XML definition and the
source of the language are able to covert the language into a number of different
forms.
This is an example of a sort of Web publishing system such as Cocoon which is
described later.
410
XML processing
XML DTD or schema
Errors
Parser
Source of XML defined
language
Outputs
Internet Technologies
411
The slide shows the operation of an XML software tool known as a parser. This
takes the XML definition of a language and source expressed in the language
and:
Checks the source for correctness
Issues errors if the source does not match the definition
If the source is correct establishes its structure
Carries out the required processing. For example, it may transform the source
into some form such as HTML or rtf.
411
<PRODUCT>
<PRODUCTNAME> Coat blue </PRODUCTNAME>
<PRODUCTPRICE> 34000</PRODUCTPRICE>
..
</PRODUCT>
Similar to HTML:
tag, element, attribute
Internet Technologies
412
Here a fragment from an XML defined language is shown. As you can see it
contains a number of elements delineated by tags and end tags with two elements
nested within another element. This is exactly the way in which HTML is
written.
412
An alternative
<PRODUCT>
<PRODUCTNAME PRICE =3400>
Coat blue
</PRODUCTNAME>
..
</PRODUCT>
Attribute
Internet Technologies
413
This is an alternative version of the text displayed in the previous slide. Here the
price of a product is expressed as an attribute.
Whether something is expressed as an element or as an attribute depends on
questions which are concerned as to whether an entity is a feature of a particular
element or an element in its own right. In the case of the <PRODUCT> example in
this and the previous slide the price would normally be expressed as an attribute.
Note that spaces do not matter in the layout.
413
414
There are hundreds of XML-defined languages. Some are shown above. The area
is so dynamic that languages are created and buried on a daily basis.
414
Internet Technologies
415
415
416
Here the source described by the DTD in the previous slide is displayed. The
indentation has been provided by me for readability: white space is ignored by
XML processors.
416
Another example
Internet Technologies
417
Here a town is defined. The definition specifies that a town will consist of a
country and a population. The second line details an attribute list for town. The
list contains a single attribute called NAME which is always required.
The remainder of the definition is not shown
417
An example of a Town
<TOWN NAME = Kettering>
<COUNTY> Northamptonshire </COUNTY>
<POPULATION>23000</POPULATION>
</TOWN>
Internet Technologies
418
418
A full DTD
<?xml version = "1.0" standalone = "yes"?>
<!DOCTYPE BOOKLIST [
Root
<!ELEMENT BOOKLIST (BOOK)*>
<!ELEMENT BOOK
(TITLE, AUTHORS, PRICE, PUBLISHER)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT AUTHORS (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ATTLIST PRICE
AMOUNTCURRENCY CDATA #REQUIRED
DISCOUNT CDATA "0"
>
]>
Internet Technologies
419
Here is an example of a full DTD. The first line specifies the version of XML
used and states that the source for the DTD will be found following it in the same
file: the alternative is to place it in another file. The next line details the root
element which lies at the top of the definition hierarchy.
The remainder of the lines specify that a book contains a sequence of title,
author, price and publisher details with price having two attributes a currency
designator and a discount which, if it is not present, will be zero.
419
420
Here a BOOKLIST is shown containing two books, note the optional DISCOUNT
element.
420
421
421
Lecture 11
XML (ii)
Internet Technologies
422
422
Aims
To examine some of the XML processing
models
To look in some detail at XSLT
To look at the application of XML in
document processing
Internet Technologies
423
A typical API
424
There are a number of APIs for processing XML source in a variety of languages.
They enable the programmer to process the XML source in two ways: by reading
the source sequentially or by accessing some tree representation.
424
DTD for
language rules
Parsers
Language
source
Language
source
XML language
processor
Errors
Programming
language
processor
Output
Errors
Internet Technologies
Output
(object
code)
425
This slide shows the comparison between a programming language processor and
a processor for an XML-based language. The major difference is that with a
programming language the syntax rules are hard-wired into the program code of
the processor rather than being stored in some equivalent of the DTD.
425
Some parsers
Xerces
IBM XML4Java
XP
OpenXML
Lark
Oracle XML Parser
List of parsers
Internet Technologies
426
The slide shows a number of parsers, many are free or open source. The most
used is Xerces.
426
Processing XML
Internet Technologies
427
If you want to process the source of an XML-based language then you need to
use an API that integrates with a parser. There are two styles of parser. SAX is an
event-based API. Methods provided as part of the API are triggered when an
event such as a start tag is encountered. It is based on the MVC model of
processing.
The other style of processing is exemplified by the DOM (Document Object
Model). Here the XML-based language source is held in memory and is traversed
by tree processing code.
427
SAX-based processing
DTD
Errors
Events
XML-based source
Processing
code from an
API
Parser
Any outputs
Internet Technologies
428
Here the parser takes the source of a language-based in XML, checks it against
the DTD for the language and every time an event occurs it sends notification to
processing code which is written using some event-based API. Typical events
include:
The start of the source
Encountering a start tag
Encountering some character data
Encountering an end tag
Encountering an error such as a missing end tag
428
Event processing
public void startElement(String tagName){
Code
public void endDocument(){
executed when
events occur
}
Internet Technologies
429
Here three Java methods are shown which respond to three events signalled by a
parser element such as TOWN, the reading of character data such as that found in
<TOWN> Kettering</TOWN> and the end of the XML-based document. Code is
placed in these method which carries out the processing required.
429
Tree-based processing
BOOKSELLER
Node
STOCK
Sub node
Tree representing
XML source
BOOKLIST
CDLIST
Internet Technologies
430
A tree representation of some XML source is shown above. It shows the root
element BOOKSELLER which is defined in terms of four other elements (Three of
whose details are not shown). One of these elements is STOCK which contains a
BOOKLIST element and a CDLIST element. These, in turn, might be defined in terms
of further elements.
Tree-based processing moves over these elements and carries out actions based
on what it encounters, for example a start tag.
430
Tree-based processing
Parser builds up some tree-based
representation of the source.
The tree is stored in memory
The API provides facilities which enable
the programmer to traverse over the source.
Internet Technologies
431
In tree-based processing the parser sequentially processes the source and builds
up a representation of the source as a tree with multiple pointers to low level
nodes. The tree is stored in memory which allows easy backtracking. A typical
API which provides facilities for processing this tree will provide code for
traversing a tree and discovering information about a node and its sub-nodes.
This is shown on the next slide
431
Processing a tree
Traverse sub-nodes
to extract all
the employees
for a particular
department
Department
Employees
Internet Technologies
432
432
Internet Technologies
433
A good API for XML-based tree processing should provide a wide variety of
facilities for accessing such trees:
Facilities which extract out information from a node, for example, the element
associated with the node, the attributes of a node and the number of children
parented by the node.
Facilities which enable the programmer to extract out the children of a node and
then process each of them individually.
Facilities which extract out data for a node, for example the string associated
with the node.
Facilities which enable the programmer to move up and down the tree.
433
434
434
XQuery
Is a form of query language which is used to
interrogate documents in XML
Gradually achieving some importance,
although it has still along way to go before it
challenges SQL
Often used with XML databases, for
example with XML-BDB
Internet Technologies
435
436
437
There are two types of graphics found on the Internet: bit-mapped graphics where
bits contain colour information held in dots or pixels and vector graphics
containing drawing instructions, for example instructions to draw lines or circles.
Vector graphics have a number of advantages compared with bit-mapped
graphics:
They use much less memory
They are interpretable by programs such as search engine spiders
They can be zoomed without losing detail
There has been a major shortage of standard in graphics and SVG is the World
Wide Web Consortiums attempt to remedy it. It is a language based on XML.
437
SVG source
<SVG width = 3in height = 2in>
<DESC>
This is a sample circle
</DESC>
<G>
<CIRCLE style = fill: red; stroke: black cx = 100
cy = 100 r = 100/>
</G>
</SVG>
Internet Technologies
438
Here an example of SVG source is displayed. The first line sets the width and
height of the drawing and the main part of the source draws a circle at position
100, 100 with a width of 100. <DESC> marks descriptive text.
438
Internet Technologies
439
The next XML-based language is used to control the channels that make up the
CDF technology. This is a push technology which is used to broadcast
information to subscribers who inform a publisher they wish to receive
information or data. The technology was developed by Microsoft.
439
Internet Technologies
440
Here a channel is defined which broadcasts every hour. The channel takes two
sources one of which shown here (www.newsservice.com). The other is shown
on the next slide.
440
Internet Technologies
441
Here the remainder of the source is shown. It specifies the second news source
and concludes with all the end tags that are required.
441
ebXML
Internet Technologies
442
442
An example
<BusinessTransaction name = "Create Order">
<RequestingBusinessActivity name =""
isNonRepudiationRequired = "true"
timeToAcknowledgeReceipt = "P2D"
timetoAcknowledgeAcceptance = "P3D"
>
<DocumentEnvelope
BusinessDocument = "Purchase Order"/>
</RequestingBusinessActivity>
..
Internet Technologies
443
Here a purchase order is created which requires the user to acknowledge that it
has been received (isNonRepudiationRequired = "true") and that
the limit on acknowledging receipt is 2 days (P2D), with the time to acknowledge
acceptance of the order being 3 days (P3D).
The code above would be followed by more text which defined how the business
responding to the transaction should act.
443
XSL transformations
So far I have dealt with hand crafting
transformations
There is a technology which allows you to
format documents
The technology is XSL, Extensible Style
Language
Supported by the WWW consortium
Internet Technologies
444
So far in this lecture and the previous one I have looked at the use of APIs in
processing XML source. A typical process might be the transformation into some
based format such as rtf. Because transformation into some publishing format is a
frequent process a technology has been devised which automates quite a lot of
the process. It means that you do not have to hand craft code using either DOM
or SAX-based APIs.
The technology is known as the Extensible Style Language XSL
444
XSL
Two components a transformation language
and a formatting language
Both can be used together or separately
The components are known as XSLT and
FOP
Both relatively new
Internet Technologies
445
When you are going to take some XML source and transform it in some way you
need two different processes. The first is the transformation into a different form
such as HTML, the second is to format the document for visual presentation.
Different components of XSL carry this process out. XSLT carries out the
transformation while FOP carries out the visual display. Both these technologies
are in a developmental state.
445
<PLANETS>
<PLANET>
<NAME> Mercury</NAME>
<MASS> .0553</MASS>
</PLANET>
</PLANETS>
Internet Technologies
446
The slide above gives a simple example of source that is to be formatted using
XSLT. It describes a sequence of planets each planet containing elements which
describe its physical properties. The mass is as a proportion of the earths mass.
446
Internet Technologies
447
Here is the first part of a transformation which takes source expressed in the
language detailed by the previous slide. The transformation is very simple: it just
takes the source, strips out the planet names and creates an HTML document with
each name in a separate paragraph.
The second line specifies where the DTD for the transformation can be found.
The third line carries out a pattern match on the tag <PLANETS> and then issues a
HTML header applies a transformation to the rest of the source and issues a
HTML terminator.
The next slide looks at the actual transformation of a <PLANET> element.
This and the next slide are taken from Inside XML, Holzner, New Riders 2000
447
Internet Technologies
448
Here the <PLANET> element has been matched. The code then selects the value of
the element <NAME>, for example Saturn and then displays it enclosed by the
HTML <P> </P> tags indicating that the text is to be treated as a paragraph.
The two final lines are just the end tags for the transformation
This slide is taken from Inside XML, Holzner, New Riders 2000
448
A server
A client
Stand alone programs (most advanced)
Internet Technologies
449
449
xsl:value-of
xsl:for-each
xsl:value-of-select
xsl:sort
xsl:if
Internet Technologies
450
There are a large number of facilities in XSLT, this slide only gives a small
flavour:
The value-of facility allows the XSLT programmer to select the value of a
particular element
The for-each facility provides a way of iterating over sequences of elements
with the same name. Its like a for statement
The value-of-select is a more powerful pattern-based version of value-of
The sort facility allows nodes to be sorted.
The if facility acts very much like an if statement allowing choices based on
some conditional criterion.
450
451
XSL formatting objects are usually used as a back end to XSLT in that formatting
objects are referred to in XSLT source
Like XSLT, FOP is defined in terms of XML.
FOP is a publishing technology and contains facilities for laying out text on a
page or on a screen.
Currently there are a number of efforts aimed at targeting particular formats for
display; however, the only really mature one is that targeted at pdf.
451
An example
<xsl:template match = PLANET/MASS>
<fo: block font-size = 36pt line-height = 48pt
font-family = sans-serif>
Mass (Earth = 1):
<xsl:apply-templates/>
<fo:block>
</xsl:template>
Internet Technologies
452
452
block
footnote
list-block
table
title
page-number
leader
Internet Technologies
453
footnote
list-block
table
title
page-number
leader
453
border
font
font-size
column-width
column-gap
margin-bottom
odd-or-even
Internet Technologies
454
The formatting objects that I described on the previous page all have properties
that can be altered by the XSLT programmer. Some of the large number of
properties are shown above:
border
font
font-size
column-width
column-gap
margin-bottom
odd-or-even
454
455
There are a number of steps that are needed for developing a formatted
document:
First, define the DTD for the documents that are to be processed
Develop the XSLT transformations
Include the formatting statements using FOP.
Process the resulting definitions and the source with a software tool which is
targeted at some format such as pdf; for example the Apache tool
org.apache.fop.apps carries out this transformation in UNIX and LINUX
455
456
There are a small number of Web publishing frameworks and tools being
developed. These provide a large degree of automation for the tasks specified on
the previous slide. The user of a Web publishing tool will be able to:
Maintain a large corpus of XML-based source, DTD, XSLT transformations and
data files. The framework should provide facilities whereby processes such as
modifying families of files is easy.
Enable the transformation process and provide facilities for specifying what is to
be transformed, in what way and where initially the result of the transformation
should be placed
Enable the process of deploying documents mainly to the Web but also to data
files, relational databases and to paper copies.
456
Cocoon
Open source publishing framework
Easy creation of XML
Easy transformation of XML to a variety of
formats.
Relatively easy deployment of documents to
Web servers
Internet Technologies
457
Cocoon is a Web publishing tool which contains a good subset of the facilities
detailed in the previous slide. It is part of the Apache open source project and is
gradually being transformed into a pure Java project. It was one of the first Web
publishing frameworks which have been deployed commercially.
457
XHTML
458
458
Lecture 12
Concurrency
Internet Technologies
459
459
Aims
Internet Technologies
460
460
Concurrency
Fact of life for single computers, also fact of
life for distributed systems
Concurrency involves the execution of
separate chunks of code
Used to take advantage of slack time on a
single computer
Encountered as a series of remote programs
in a distributed system
Internet Technologies
461
In a distributed system there will be a number of activities being carried out at the
same time. For example, a number of clients may be accessing the same server
asking for a particular service to be provided. Such concurrent activities provide
both opportunities for the developer to optimise a distributed system, but at the
same time pose some sophisticated and tricky problems. This lecture looks at the
topic of concurrency and how it affects a distributed system. It <rst looks at
concurrency occurring at a single computer and very briefly examines the Java
facilities for de<ning and controlling concurrent operations. It then looks at how
a distributed system can be viewed as a set of concurrent processes running on a
wide variety of computers.
461
Internet Technologies
462
462
Internet Technologies
463
There are two ways of implementing threads in Java. The first is to inherit from
the class Thread found in the package java.lang. This class has one method
run which needs to be overridden with the code for a thread. An implementation
of a simple thread is shown above
463
Internet Technologies
464
This is a simple class which just has a single constructor and a single instance
variable which identifies the thread. The code for the thread can be found within
the method run. This code is straightforward: it loops a hundred times; each
time it moves through the loop it sleeps for 100 milliseconds (the method sleep
takes an int argument which is the number of milliseconds to sleep) and then
displays a message identifying the thread.
464
ThreadDemo th1 =
new ThreadDemo(First example)
..
th1.start();
Internet Technologies
465
All this code does is to create a thread object and then starts it executing. When
there are a number of threads a scheduler will determine which of them is to be
given control of the hardware processor
465
Thread APIs
Most programming languages have thread APIs,
some examples from Java are shown below
destroy
setPriority
getname
setname
start
Internet Technologies
466
466
Threads or processes
Internet Technologies
467
467
A solution - locking
Locking ensures that inconsistent updates
do not occur
Locking can affect both in-memory and filebased data.
A poor locking strategy can have a major
effect on the performance on a system: both
on a single computer and distributed
Lock can be a read lock or a write lock
Internet Technologies
468
468
Locking queues
Thread
accessing shared
data
Shared data
Suspended threads!
Thread queue
469
When a thread is accessing shared data and a number of other threads want
access to that data then they are placed on a lock queue until the current
accessing thread signals that it has finished.
These queued threads are suspended and cannot make any progress
469
Locking decisions
Operation1
Operation2
Conflict
Read
Read
No
Write
Read
Yes
Write
Write
Yes
None
Allowed
Allowed
Read
Allowed
Wait
Write
Wait
Wait
Internet Technologies
470
The tables above show whether conflicts exist when reading and writing occurs
and what a good locking scheme should allow.
Looking at the top table it is clear that in order to minimise the holding of locks a
suitable concurrency control scheme should take cognisance of the fact that a
large number of transactions could be simultaneously reading data but that only
one transaction would be able to write data.
Such a scheme implements what is known as the many reader single writer
scheme. The most popular way of implementing this is via two types of lock: a
read lock and a write lock.
470
Lock management
Many reader-single writer scheme usually
implemented
Strict two phase locking a popular solution
Two phase locking is based on lock
promotion
Internet Technologies
471
471
Internet Technologies
472
One of the major problems that af>ict concurrent systems is that of deadlock.
This occurs when there is a contention between two transactions for two items of
data. As an example of this consider a transaction (T1) which requires access to
an item of data (d1), but which has already issued a write lock to an item of data
(d2). Also assume that another transaction (T2) which currently has locked d1
executes code which tries to access the item of data d2. T1 will be unable to
proceed because T2 has locked d1, while T2 will be unable to proceed because it
requires d2 which T1 has locked. The two transactions are in a state of limbo
waiting for each of them to proceed. This situation is also known as the deadly
embrace. Deadlock occurs in all distributed systems where there is shared access;
however, in those systems where there are a number of clients which hold data
for a long time (the typical interactive system) it is a major occurrence.
472
Internet Technologies
473
473
Wait-for graph
Can be built up by a lock manager.
Manager looks for cycles when a lock is
created.
Takes action when a potential cycle is
detected for example by aborting another
transaction.
Detecting cycle is easy, deciding on what to
do then is more difficult
Internet Technologies
474
474
Using timeouts
Majority of database servers use timeouts
Rough and ready solution:often nondeadlocked transactions will be aborted and
it penalises long running transactions
Good DBMS will allow a database
administrator to set the time between lock
examinations
Internet Technologies
475
The vast majority of database servers use timeouts to eliminate deadlock. Each
lock that is created is given a time period during which it can exist without being
removed. After this time if another transaction wants to use the data that is locked
then the transaction that holds the lock is aborted and the new transaction locks
the data and is allowed to access it.
Using timeouts is a rough and ready solution compared with processing a waitfor graph. It suffers from a number of problems. The <rst is that transactions can
often be aborted even if they are not deadlocked. A second problem is that longrunning transactions can be penalised too heavily.
475
476
Most database systems obtain a read lock when they read data from a table and a
write lock when they write to a table. Unfortunately the developer is unable to
directly in>uence the locking strategy used for a particular database. It is
managed by the database management system based on the following factors:
The lock size chosen for the tables that make up the database whether the lock
is, for example, on a page, row or on a whole table.
What sort of access is allowed, for example whether a dirty read is allowed on a
table.
The particular SQL statements involved.
The number of items expected to be locked for a particular transaction.
What mechanism is used to carry out an operation, for example whether an
index is to be used.
476
Row locking
Page locking
Table locking
Database locking
Internet Technologies
477
477
478
478
Lecture 13
Transactions
Internet Technologies
479
479
Aims
To examine the nature of transactions
To look at the essential properties of
transactions
To understand the use of application servers
To detail a case study: Enterprise JavaBeans
Internet Technologies
480
Transactions
All or nothing
481
A transaction is a set of atomic operations which carry out some access to stored
data. For example, a typical transaction which processes a database of stock for
an online retailer is shown below:
A customer orders an item.
The system checks that the item is in stock.
If the item is in stock then the customer is allocated the item and the stock total
for the item is reduced by one.
If the stock for the item is dangerously low then an order for new stock is
placed.
An important property of each of the operations that make up a transaction is that
it is atomic.
This means that when a client carries out an operation such as updating a
database then this operation is free from interference by an operation which
belongs to another transaction. In the previous lecture I showed how, by using
locks, this could be achieved.
Transactions can also be atomic. An atomic transaction is one which must either
be totally carried out or not carried out at all. For example, a series of related
credits and debits carried out on a bank account either must have all been carried
out or, if some reason for aborting occurs, none of them must be carried out.
481
Internet Technologies
482
The acronym ACID is often given to the properties of a transaction. It stands for
Atomicity, Consistency, Isolation and Durability. Atomicity means a transaction
must be atomic as de<ned in the previous slide. Consistency means that a
transaction must leave stored data in a consistent state, for example the balance
of a bank account must re>ect the fact that credits have been added and debits
subtracted. Isolation stands for the fact that a transaction must not be interfered
with by other transactions. Durability stands for the fact that after a transaction
has completed its operations the results are stored in permanent storage usually
some form of disk storage.
482
Serial equivalence
Important property
Means that if a number of concurrent
transactions are applied the effect would be
the same as if they were applied serially
Means that problems such as the lost update
problem do not occur
Internet Technologies
483
483
Distributed transactions
Allow more concurrency
Provide more flexible policies for abortion
Internet Technologies
484
484
Internet Technologies
485
485
486
486
487
As I have described in the previous lecture the most popular way of handling
concurrency control is via locks. Each server in a distributed system will have a
lock manager which will decide whether to grant a lock to a transaction; if it does
not then the transaction has to be queued up to wait for the data that it is
accessing to become free. When a transaction is committed or aborted it will
release a lock. The rules for locking for a nested distributed transaction are as
follows:
Parent transactions are prevented from running at the same time as their
children.
Children in
transaction.
nested
transaction
will
inherit
locks
from
parent
If a nested transaction wants a read lock on a shared data item then all the
holders of the write lock on the transaction must be its ancestors.
If a nested transaction wants a write lock on a shared data item then all the
holders of both write and read locks must be its ancestors.
When a transaction commits then all its locks are inherited by its parent.
When a transaction aborts all its locks are removed.
487
Distributed deadlock
Internet Technologies
488
488
Distributed deadlock
Can be handled as described in the previous
lecture: by having a central server checking
for cycles
Impractical: what if the server malfunctions
or the transmission medium used breaks
In practice edge chasing algorithms are used
Internet Technologies
489
There are two major problems with using a central server. The <rst is that
if only one server were used and that server malfunctioned then the
system would be in great trouble and deadlocks would build up to the
point where performance would greatly degrade. The second problem is
that scalability cannot be achieved: as a system grows, more and more
pressure would be built up on the server carrying out deadlock detection
to the point where its performance would suffer; this degradation of
performance would affect other servers which contain deadlocked
transactions and hence the system itself, since the remaining servers
would rely on an overloaded server to enable them to remove deadlocks
and proceed. You will remember that earlier in the course I described one
of the major advantages of distributed computing being the fact that a task
can be split up to be executed on a number of servers, thus leading to
some degree of scalability. It is clearly an advantage for the deadlock
detection process to be made distributed and not centralised on one
particular server.
One popular way of distributing deadlock detection is for each server to
send messages to other servers initiating transactions to indicate that they
are waiting for another transaction. Such a distributed algorithm is known
as an edge chasing algorithm.
489
TP Monitors
Originally associated with mainframe
computers
Manage concurrent execution
Ensure ACID properties are maintained
Solves the problem of thousands of users
concurrently accessing databases
Internet Technologies
490
490
TP monitor functions
491
The functions of a TP monitor are shown below, they have been taken
from a description of the IBM CICS monitor
Initiate and destroy threads to carry out transactional operations. Many
transaction monitors will access a pool of threads which have been set up
when the monitor was started.
Manage the resources that are being accessed, for example ensuring that
updates are carried out in such a way that the resource does not find itself
in an inconsistent state.
Ensure that if a transaction fails then suitable action is taken; this action
can be provided by a programmer as code to be executed. In order to do
this most TP monitors will use a two-phase atomic commit protocol.
Schedule threads so that low-priority transactions, for example batch
transactions, are allocated a smaller share of resources than high-priority
transactions such as online transactions.
Enable the processing load on a distributed system to be shared between
a number of servers.
Enable a distributed system to function even in the presence of the
failure of one or more servers.
491
492
492
Bean developer
Container provider
Server provider
Application assembler
Deployer
System administrator
Internet Technologies
493
493
494
494
Internet Technologies
495
495
Internet Technologies
496
An entity bean represents some stored entity that is used in an application and
which requires permanent storage. Examples include: bank accounts,
warehouses, stock containers, >ight plans, stock portfolios, insurance policies and
hotel bookings.
An entity bean will normally be mapped into data stored in a relational database
system, although it is quite possible for them to be mapped into data in an objectoriented database.
An important point to make about entity beans is that since they model long-lived
data an application server will provide facilities whereby, if a server crashes or
some disastrous event occurs, the bean state will not be destroyed.
A session bean is a bean which performs some business logic; they do not model
some stored entity such as a bank account. Typical examples of the type of work
that a session bean carries out are:
Processing a debit on a bank account.
Processing an order for some e-commerce product.
Making a trade for some stock or share.
Querying a warehouse for information about stock which requires replenishing.
A session bean will only last for the period during which a client interacts with
the bean
496
Lecture 14
Distributed System Design
Internet Technologies
497
497
Aims
To examine some performance prediction
methods
To detail some design principles
concentrating on performance
To look at some of the trade-offs involved
in distributed application design
Internet Technologies
498
498
Internet Technologies
499
This lecture looks mainly at the design of a distributed system for performance; however,
I shall also look at some of the issues involved in ensuring that reliable services can be
maintained in the presence of hardware failure. At this point it is worth stressing that
many of the design decisions that are made do not have a direct effect on factors such as
performance, but an indirect effect. As an example of this consider data replication. This
is where a database in a distributed system is copied a number of times and located at a
number of points in a network. There are a number of reasons for replicating data; a
major one is that it enables data to be close to users. If a database is stored on a local area
network rather than a location which requires a wide area network access, the amount of
time for the wide area network to provide the data is often orders of magnitude higher
than if it were provided by a local area network.
You might think that many of the problems associated with low-speed access to
databases across a network would be solved, at a stroke, by just carrying out large-scale
replication; unfortunately this is not the case: replicated databases will need to coordinate
with each other as each is updated. After a replicated database has been written to it has
to send messages to all the other replicated databases in order that they reject the changes
that have occurred. This gives rise to two factors which reduce the performance of a
distributed system. The first is that extra traffic is generated in the system; this is often
high-priority traffic and will delay any traffic which has originated from application
transactions. The second is that updates to a replicated database will delay transactions to
that database until it matches the state of the database which they have originally been
applied to.
499
Performance prediction
Internet Technologies
500
500
Internet Technologies
501
Many server vendors publish performance data on how their servers perform
against a number of benchmarks. This type of data is moderately useful when
making rough comparisons between servers in a search for the most powerful
server. However, as the sole means of predicting performance in a distributed
system it is almost useless. There are two reasons for this: first, server vendors
will often choose a mix of transactions which make their servers perform well
and, second, these benchmarks are often highly specialised and atypical and do
not match the specific transaction mix of any particular application.
501
Rules of thumb
Experienced developer uses various rules of
thumb
One example is the effect of caching
strategies
Quite useful for optimising design
Less useful for prediction
Not recommended for novel systems
Internet Technologies
502
502
Simulation modelling
Internet Technologies
503
503
Analytic modelling
Uses applied maths, usually statistics and
probability theory
Relies on a mathematical model
Equations relate queues, processors, devices
and data messages
As accurate as simulation modelling
Expertise is very short on the ground
Internet Technologies
504
504
505
This is often referred to as benchmarking. There are two types of projection that
can be made. The first is to measure critical results such as internal wait times
and the time taken for results to be processed and appear at a client for just one
clientserver relationship with no other competing work. This is an excellent way
of predicting performance in small systems, particularly when it is augmented by
a little simulation modelling or analytical modelling in order to cope with the
added complexity of multi-threaded working.
The second type of projection is that made from data gathered by running a
number of representative processes.
505
506
The steps below are employed in a full benchmark Once these processes have
been carried out the developer will have gained a very good idea about how a
target system will perform. Unfortunately there is a major problem: a large
amount of resource needs to be committed for the process, often making it
uneconomic. Only systems which are immensely performance-critical and
mission-critical can be analysed in such a way.
Produce a working distributed system in terms of hardware elements.
Find some programs which replicate the workload that will be
experienced by the servers in the system.
Find and load representative test data.
Run the system with users who will generate a meaningful pro<le of
transactions.
Monitor key parameters of the system such as wait time.
Analyse the data.
Vary the workload and see how this affects critical parameters.
506
Principle of locality
Principle of sharing
The principle of parallelism
Internet Technologies
507
507
Internet Technologies
508
508
Internet Technologies
509
Probably the best known example of the locality principle is that data that is
related to each other should be grouped together. Already in the introduction to
this section I have described one example of this where two tables which were
related by virtue of the fact that they were often accessed together were moved
onto the same server. This whole principle applies to all sorts of groupings of
data: rows in a relational table, columns in a relational table, tables themselves
and attributes of objects. For example, the analysis of an object-oriented system
will produce a series of documents which will describe information such as the
functionality of the system, the classes involved in the implementation of the
functionality and the relationship between the classes. If analysis has been carried
out competently, then there should be little, if any, design information produced,
apart from perhaps specification performance constraints.
The role of design is to take the analysis product and turn it into some form
which is heavily adorned with physical detail. One application of the principle of
keeping data together is to form composite classes which are constructed from
two or more classes identified during the analysis phase. The decision as to
whether classes should be composited is a serious one. It is based on an appraisal
of the workload of a system and the transactions that objects derived from the
classes take part in. If the developer used an object-oriented database system to
store objects then such a compositing decision would have a significant impact
on performance if the objects which were composited were frequently retrieved
together.
509
510
The idea behind this is that if two programs communicate with each other in a
distributed system then, ideally, they should be located on the same computer or,
at worst, they should be located on the same local area network. The worst case is
where programs communicate by passing data over the slow communication
media used in wide area networks.
There are a number of ways of implementing this design decision, the most
obvious being to statically store programs which communicate together on the
same server; an alternative would be to dynamically load these programs at run
time. However, a word of warning is necessary: many programs that are found on
separate servers will be there because they are communicating with some local
database. Thus, the decision to bring together two programs will often require
data to be moved with an effect on performance ensuing.
510
Internet Technologies
511
511
Caching
Stores data on local memory
Dynamic caches are known write-back or
write-through caches
A number of caching strategies, for example
least recently used strategy
Used in browsers
Internet Technologies
512
Caching is an excellent way of speeding up a system for data which is not subject
to much change such as simple Web pages which do not contain dynamic data.
For data that does change performance gains can still be achieved; however, the
maintenance of the pages in the fast area of memory devoted to storage known
as the cache reduces the gains and sometimes requires quite a degree of extra
programming.
Caches which deal with dynamically updated data are known as write-back
caches or write-through caches. For such caches when a transaction updates some
stored data which appears in a cache at a client computer the following must
occur:
The data that is stored at the clients cache must be updated to reject the
change.
The stored data corresponding to the cached data must also be updated at
its server.
All other caches at other clients must be changed to respect the changed
data.
512
Caching strategy
513
513
Principle of sharing
Internet Technologies
514
514
Internet Technologies
515
A major decision to be made about the design of a distributed system is how the
servers in a system are going to have the work performed by the system
partitioned among them. The main rationale for sharing work amongst servers is
to avoid bottlenecks where servers are overloaded with work which could be
reallocated to other servers.
515
Applications
Internet Technologies
516
516
Internet Technologies
517
Most of the decisions about locking will be made with respect to the database
management system that is used. In general such systems allow locking at one or
more sizes: page locks, table locks, database locks and row locks.
Locks are defined as a property of a table or database when the database designer
defines the structure of the database. Many of the decisions about what sort of
locking to adopt are common sense ones, for example when a table is going to be
the target of a bulk update it is better for efficiency reasons to use a table lock
and when data is just being read by online transactions and updated by batch
processes to use the largest lock size in order to minimise the amount of locking
that occurs.
517
Performance
Internet Technologies
518
518
Minimising deadlock
Monitor where they are occurring and then
modify database code, for example
changing access order
Use data replication
Experiment with the deadlock break interval
Internet Technologies
519
519
Dirty read
Committed read
Cursor stability
Repeatable read
Internet Technologies
520
Another factor which many database systems allow to be varied is the isolation
level of a program. There can be as many as four isolation levels which a DBMS
will allow the database administrator to choose.
Dirty read. This is where a transaction can read data which has been
modified but the changes that have occurred have not been committed.
Committed read. Here a transaction is not allowed to read dirty data and
overwrite another transactions dirty data.
Cursor stability. Here a row being read by one transaction cannot be
changed by another transaction.
Repeatable read. All items are locked until a commit has been executed
As you proceed down the list of bullet points above the strength of the isolation
increases; this will increase the number of locks and hence the greater the chance
of deadlock occurring and performance dropping. The design principle here
should be that the isolation level chosen should be the weakest consistent with the
application data integrity and the demands of the application.
520
An example
Internet Technologies
521
521
522
The key idea behind the parallel principle is that of load balancing:
ensuring that a resource, be it a database, processor, relational table,
memory or object, is subdivided without incurring too many of the
overheads listed above.
For example, one decision that the designer of a distributed system has to
make is what to do about very large programs which could theoretically
execute on the same processor. Should this program be split up into
different threads which execute in parallel, either on a single server or on
a number of distributed servers? The designer here has to make a decision
which is driven by a consideration of the amount of synchronisation and
communication which occurs between the threads. If threads could spend
a large amount of time working away at a particular algorithm with little
access to shared data and with only small amounts of data required for
communication, then splitting the program into a number of component
programs would be a good decision, even over a number of servers.
Unhappily things are never so clear cut as this: very few programs can be
cleanly partitioned and a careful consideration of the operating system
overhead and resources consumed by the various threads involved has to
be carried out before making a partitioning decision.
522
Reliability
Key idea in distributed applications
More prone to fail than standalone systems
Server can fail, medium for transmission
can fail
System should be designed so that it can
cope with failure, albeit with perhaps a
degraded performance
Internet Technologies
523
The slides have outlined some techniques and principles for designing a
distributed system so that it has an acceptable performance. The other major
worry that a designer has is that of reliability. With the advent of distributed
systems and the increasing incidence of systems which interact with the general
public this has become a much more important design factor than it once was: for
example, in the old days of mainframe computers and minicomputers when, say,
customers phoned in their orders, companies could cope fairly easily with
incidents which caused computers to malfunction: normally such companies
would switch over to some manual form of ordering where order staff would
consult printouts which were generated on a periodic basis, say every two or
three hours.
The advent of clientserver computing and the World Wide Web, where
customers can inspect stocks of products and order them online, has not only
changed the importance of reliability where a malfunctioning server could
effectively shut down a retailer for a period, but also, at the same time, provided
many of the tools that can ensure that a system will always be running albeit at
a lower performance level.
523
Recovery files
Replicated databases
Mirrored servers
Multi-parallel running
Important to point out that many reliability design decisions are taken
out of the hands of the designer
Internet Technologies
524
The first aspect of dealing with failures is that of recovery: that when some
failure occurs such as a server malfunctioning the data stored on that server can
be quickly recovered and reconstituted at a working server. In order to do this
many systems use some form of recovery file. This is a file which contains a list
of changes that have been applied but have not yet been committed; if there is a
malfunction then a program known as a recovery manager will carry out the
process of restoring any files which are in limbo when the malfunction occurred.
Another technique for achieving reliability, and which has a faster response to
failure than techniques which use a recovery file, is parallel running. Here a
number of servers with replicated databases process the same transaction. Each
time that a transaction is received by the distributed system in which the servers
are located they will all apply that transaction to their databases. In this way, if a
fault occurs, recovery from the fault would be virtually instantaneous.
This is an expensive way of implementing reliability and is only really suited to
systems where a very high degree of reliability is required. There are
intermediate solutions. For example, a server could be designated a primary
server and a collection of other servers designated secondary servers. These
servers would lag behind the primary server in terms of updates to their
databases; however, if the primary server malfunctions one of the secondary
servers would catch up by applying the transactions which distinguished the
difference between the primary server and itself. These transactions would
normally have been written to some recovery file.
524
Lecture 15
Security (i)
Internet Technologies
525
525
Aims
526
526
527
527
Integrity threats
Confidentiality threats
Denial of service threats
Authentication threats
Internet Technologies
528
Integrity threats involve the retrieval and tampering of important data such as
credit card details.
Confidentiality threats are concerned with the reading of confidential data
Denial of service threats are concerned with restricting, degrading or removing a
service provided by a host
Authentication threats involve an intruder pretending to be an authorised user and
carrying out operations that the authorised user is allowed to execute.
528
Confidentiality
Authentication
Integrity
Non-repudiation
Access control
Availability
Internet Technologies
529
The bullet points above represent requirements for secure Internet applications:
Confidentiality means that information stored on a system cannot be accessed by
unauthorised parties
Authentication means that the origin of a message or transaction is correctly
identified and the originator is who they claim to be
Integrity means that only authorised parties are able to change data
Non-repudiation means that neither the sender or the receiver of a transaction
can deny that a transaction took place
Access control means that facilities in a system are controlled so that users are
only allowed to use resources that they are authorised to use
Availability means that the resources of a system are available to authorised
users when they are needed by users.
529
Internet Technologies
530
dumpsters
(rubbish
containers)
for
paper-based
530
Email bombs
List linking
Denial of service attacks
Internet Technologies
531
531
Viruses
Program that executes on a host and causes serious
problems
A number of categories: executable virus, data
virus, device driver virus, stealth virus,
polymorphic virus
A number of virus kits available on the Internet
Introduction to viruses
Internet Technologies
532
A virus is a program that is inserted into a host and which carried out some
destructive process such as deleting the file store.
There are three main types of virus: executable viruses, data viruses and device
driver viruses. An executable virus is a virus which is attached to an executable
file which, when executed, will result in the virus code being run. This code will
then carry out some malicious act such as deleting important files. A data virus is
a virus which infects a file containing data, rather than executable code. Often
this data is associated with some program and which the program requires in
order to carry out its functions. For example, many programs require a start-up
file which initialises the program and sets up basic parameters for its operation. A
data virus could infect such a file and set the data in it to values such that the
program will crash or its functions. A third class of virus is the device driver
virus. This infects the device drivers of an operating system which are then used
to piggy-back into other parts of a computer such as its file store
There is also a further classification of viruses which categorise the ways that
they use to hide their presence on a computer. There are two types of virus which
are categorised in this way, the stealth virus and the polymorphic virus.
532
Scanner attacks
A scanner is a program which detects
system weaknesses
Poor name
Most famous is SATAN
Example is a scanner which checks that
sendmail is secure
Scanners can be used for intrusion
Internet Technologies
533
533
Intro to passwords
Internet Technologies
534
534
Sniffer attacks
Devices which are used to read packets of
data moving around a network
Used by system administrators for detecting
inefficiencies in a network.
They can be used for siphoning off sensitive
data
Internet Technologies
535
These are devices which read the packets of data that travel around a network.
They have a legitimate use for systems administrators since they can be used for
determining the efficiencies and inefficiencies in a network, for example they can
be used for detecting choke points: parts of a system where network traffic is
heavy. They are also used by developers, for example, in order to judge the
design of a distributed system in terms of the traffic it generates.
However, they have often been used for siphoning off sensitive data. An intruder
might install a sniffer at a strategic point in a network such as a gateway and read
the traffic that is passing through the gateway. A successful sniffer can detect
hundreds, if not thousands, of passwords in a matter of hours and send them to a
remote computer where they can be used for unauthorised intrusions.
Sniffer attacks are, surprisingly, not very prevalent; however, when they occur
they can compromise a very large number of computers. For example, a recent
sniffer attack on a number of computers resulted in 268 sites (not computers, but
sites!) having their computers violated.
535
Internet Technologies
536
536
Spoofing
Internet Technologies
537
This is a jargon term used to describe the fact that an intruder uses a computer to
masquerade as another trusted computer in order to carry out operations that the
user(s) of the trusted computer are allowed to initiate. Spoofing does not require
the in-depth knowledge of passwords and authentication that the previous
intrusion methods do: it just relies on masquerading as a computer that a network
trusts. In order to understand what spoofing involves it is worth looking at one
variety of this technique known as IP spoofing. This attack uses the TCP-IP
protocol to subvert the normal authentication controls in a system by running a
computer which purports to have an address that is trusted.
Another form of spoofing is DNS spoofing. This is less serious than IP spoofing
as it can easily be detected; however, this has not prevented a small number of
such attacks over the last five years. It involves infiltrating a domain name server
and rewriting the fiof the server so that a computer which is outside a network
can be given the same name as a trusted computer. This means that clients who
request a service from the trusted computer using a symbolic name would be
routed to the rogue computer which could then involve them in a dialogue in
which important information such as credit card details is elicited.
537
Technology attacks
538
These are attacks which rely on security flaws in some software, often newly
released software. For example applets are Java programs which are downloaded
onto a client computer running a browser. In the early days of applets a number
of security violations occurred:
Applets could be used for denial of service attacks with certain browsers.
One browser was vulnerable to applets writing data to the system files used in
Windows 95.
An applet has been written which would automatically reboot Windows 95.
On one version of the Netscape Navigator browser an applet can capture a Web
page which acts as a form, read some data entered by the user and then send that
data to a remote server.
With some versions of the Netscape Navigator and Internet Explorer applets can
capture the IP addresses of computers in a closed network.
538
Cryptography
The main technological core on which security is based
It has been round since Roman times
The history of cryptography has been of novel schemes
initially being effective and then being cracked.
Lectures will look at symmetric and asymmetric
cryptography
Introduction to cryptography
Internet Technologies
539
539
A warning
Internet Technologies
540
540
Plain text
Crypto
algorithm
Cipher text
Key
Internet Technologies
541
541
Cryptography
Algorithm changed by key
Symmetric cryptography described in
previous slide
Large number of algorithms available
Key has to be distributed (weakness)
Encryption changes original text and
decryption changes it back
Internet Technologies
542
What is described on the previous slide is symmetric cryptography where the key
used to vary the algorithm is sued by both the agent who is sending and who is
receiving the message. This is a weakness as there must be a secure way of
distributing keys
There are a number of high-powered virtually uncrackable algorithms around,
later slides will describe them
542
Internet Technologies
543
543
DES steps
Internet Technologies
544
The processing steps carried out by DES are detailed above. These can be carried
pout by either software or hardware.
544
DES problems
Key too small
Processes involved lead to known plaintext
attacks being successful
Calculated a $1m machine could crack DES
in 2 hrs
Still useful for commercial and personal use
Internet Technologies
545
DES is becoming something of a historical curiosity: its key size is too small and
the processes involved in lead to known plaintext attacks being successful. A
known plaintext attack is one where an intruder has access to a plaintext and a
cipher text pair.
545
Triple DES
AES
Blowfish
IDEA
RC2
RC4
RC5
An introduction to AES
Internet Technologies
546
Triple DES. As its name suggests this is a variant of the DES scheme. It
involves applying the DES algorithm three times to a plain text. Triple
DES has been used by financial institutions such as banks as a more
secure alternative to DES.
AES is the American government replacement of DES
Blowfish. This is an algorithm which is capable of using a 448 bit key. It
is unpatented and is available for anyone to use.
IDEA. This is an algorithm developed in Switzerland and published in
1990. It uses a 128 bit key and is patented.
RC2. This is a cipher which was developed by the American security
researcher Ronald Rivest. It transforms blocks of data and relies on a key
which can range from 1 to 128 bits.
RC4. This is a cipher which transforms data on a character by character
basis. It was originally a trade secret; however, it was published on a
Usenet newsgroup in 1994. It can employ a key which ranges between 1
and 2048 bits. The cipher was developed, like RC2, by the American
researcher Ronald Rivest.
RC5. This is a cipher which encrypts blocks of text and which was
developed in 1994, again by Ronald Rivest.
546
A cheering statement
You do not have to know the innards of a
cryptographic algorithm in order to use them.
There are a number of commercial products which
employ block ciphers and which just require the
programmer to call code methods or subroutines
such as encrypt(string or stream) and decrypt(string
or stream)
Internet Technologies
547
547
Asymmetric cryptography
Also known as public key cryptography
An attempt to overcome the major problem
with symmetric cryptography: the key
distribution problem
Requires the use of two keys: a public key
and a private key
Introduction to public key cryptography
Internet Technologies
548
One of the problems with symmetric key encryption is that both participants are
required to use the same key. This means that they need to be distributed to them
both, perhaps over some secure medium. For example the keys could be
maintained in a key server which might be subject to being tampered with.
The Americans Diffie and Helman developed public key cryptography in 1976. It
requires two keys a public key and a private key.
548
Encryption and
decryption
Bobs
public key
Plain text
Encryption
algorithm
Bobs
private key
Ciphertext
Decryption
algorithm
Bob
Alice
Internet Technologies
549
Here two agents communicate. Alice wants to communicate with Bob. Bob
published a private key which Alice uses to encrypt the plain text. This is then
sent to Bob. Bob then uses a private key which forms part of the (private key,
public key) pair to decrypt the message.
Here there is no need for Alice to know Bobs private key.
549
Encryption
algorithm
Alices
public key
Ciphertext
Decryption
algorithm
Bob
Alice
Internet Technologies
550
550
Comparison
Symmetric
Public key
Same algorithm
Sender and receiver share key
and algorithm
Key is secret
Impossible to decipher if no
other info available
Knowledge of algorithm plus
cipher text must be insufficient
to determine the key
Internet Technologies
551
The slide above describes the main differences between each of the two
algorithm approaches.
551
552
The slide shows the original five conditions detailed by the researchers Diffie and
Helman for public key cryptography with a sixth useful one but not necessary.
552
RSA algorithm
Developed by Rivest, Shamir and Adelman
Block cipher in which the plain text and
cipher text are integers between 0 and n, for
some value of n
Algorithm based on factorisation of integers
Internet Technologies
553
The RSA algorithm is the most popular and virtually unchallenged algorithms.
553
Algorithmic aspects
Encryption and decryption requires raising to a
power, quite a slow process although there are
some reasonable algorithms about.
Even so, public key cryptography is not used for
bulk transfer
Key generation involves finding two large prime
numbers. Usually done by randomly selecting odd
numbers and testing for primality
Internet Technologies
554
554
Attacks on RSA
Internet Technologies
555
There are a number of possible attacks on RSA. The first is to try every possible
private key. The solution here is to make the key space large.
The second is to develop efficient factorisation algorithms. Factorising numbers
used to be very hard, it is becoming a little easier and is still the subject of
research. If RSA is ever cracked it will be because of an efficient factorisation
algorithm.
Timing attacks involve the attacker measuring the computation time for
deciphering messages. Happily this can be easily countered by, for example,
inserting random delays into the decipherment process.
555
Key management
Two aspects to this
The distribution of public keys
The distribution of secret keys for
symmetric cryptography
Internet Technologies
556
The nest few slides look at the problems that are involved with key distribution.
The first problem is how are public keys notified to users who wish to
communicate with the user who has the public key?
The second is how can symmetric keys be distributed in such a way that there is
no possibility that an intruder can discover these keys? Because public key
cryptography is inefficient many users still use symmetric cryptography.
However, the key distribution problem is still a major drawback. Happily there is
a solution involving public key cryptography
556
Internet Technologies
557
A major algorithm that is used for secretly exchanging keys is based on public
key cryptography. It relies on the fact that it is very computationally inefficient to
calculate the discrete logarithm of a number. In the description that follows I
shall refer to a which is the primitive root of an integer. Do not worry how this is
calculated.
The algorithm is due to Diffie and Hellmann and was developed in the seventies.
557
Public announcement
Publicly available directories
Public-key authority
Public key certificates
Internet Technologies
558
The next few slides look at the problems involved in maintaining a set of public
keys and some of the solutions to these problems. Each solution is presented in
terms of increasing order of effectiveness.
558
Public announcements
Very easy
How it was intended to work
Major problem is that anyone can announce
a public key and masquerade as a user and
read data intended for the user
Internet Technologies
559
The simplest scheme is for a user to announce his/.her public key in some public
forum such as a Web site or a news group. Anyone can then send messages to
that user employing the public key. The major drawback here is that anyone can
purport to be a user and issue a public key and then read data intended for that
user.
559
Internet Technologies
560
Here some trusted authority maintains a key store which is regularly published.
The trusted authority uses security provisions to ensure that the owner of the
public key is who they purport to be.
This is still not 100% secure since someone can still find the private key of the
authority and pass out counterfeit keys.
560
Internet Technologies
561
The scheme detailed above and in the next slide is much more secure and
involves communication between the subscribers and the authority using public
key cryptography
561
562
The message sent back to A will contain the public key of B, a time stamp so that
A can determine whether this is no an old message with an out of date public key
and the original request so that A can view the request to check that it has not
been tampered with before it was received by the authority.
562
563
563
Internet Technologies
564
The bullet points above describe the important criteria used for certificates and
certification authorities. How they work out in practice will be describes in the
next lecture.
564
565
565
Disclosure
Traffic analysis
Masquerade
Content modification
Sequence modification
Replay modification
Digital
Repudiation
First two
handled by
cryptography
signatures
Internet Technologies
566
The slide above details some of the attacks that can be made on a message
transmitted
Disclosure means that content is released to a third party
Traffic analysis involves looking for patterns of data between participants
Masquerade involves pretending to be someone else
Content modification involves changing the data in a message before it is
received
Sequence modification involves inserting, deleting or modifying individual
sequences of messages
Timing modification involves delaying or replaying messages
Repudiation involves denial of receipt or denial of transmission of a message
566
Symmetric encryption as an
authentication mechanism
Provides confidentiality
Has a degree of authentication
Does not provide a signature facility
Internet Technologies
567
567
Internet Technologies
568
568
569
569
Hash functions
Mapping f from integer to integer
Computationally easy
Should have the property that if a is not
equal to b then the probability that f(a) is
not equal to f(b) is virtually 1
Internet Technologies
570
570
571
The slide shows the main properties of a secure hash function. The mains security
property is the final one. If the function has this property then a message cannot
be tampered with.
571
The process
B
A
Checksum
Message
Internet Technologies
572
A user who wishes to send a message encrypts it, calculates the checksum using a
hash function, encrypts that and then sends both to another user.
The second user decrypts both of these. He/she then calculates the checksum of
the sent message and compares it with the decrypted checksum; if they are the
same the message has not been changed.
572
HMAC
The MD Series
The SHA series
Internet Technologies
573
There are a number of message digest algorithms available. Three are shown
above. HMAC uses public key cryptography. The MD series uses 128 bit digests
and the SHA series, developed by the American National Security administration
uses 160 bit digests.
573
Internet Technologies
574
Four uses for message digest functions are shown above. The first use has already
been discussed.
Pass phrases are phrases that are used to identify a user to a system, for example
Hello I like potatoes for tea A message digest function can take such a phrase
and generate a password from it which is near unique and cannot be cracked by
guessing.
Message digest functions are also used for checking virus infection. Each file in a
system is mapped to its message digest; if a virus has infected a file then it does
not match its digest value. A virus checker would periodically scan the files to
check this.
Message digests are also used in digital signatures (see next lecture)
574
Digital certificates
Already discussed lightly
Issued by a third party organisation such as a
national PTT
Contains name of the user, unique serial number,
the users public key, digital signature of the
certificate issuer.
In order to function the recipient of data needs
access to the issuers public key, often embedded
in packaged software such as a browser
Internet Technologies
575
I have already discussed digital certificates. To complete the story I shall look at
some practical examples of certificates and how they are used
575
576
The x509 standard is the most frequently used standard. It uses digital signatures.
These are detailed in the next lecture. An important facility is that it provides the
facilities where extra authentication information can be contained in the
certificate via name/value pairs.
576
577
When you want to read some data which is provided by a user who is associated
with a public key and who is described by a digital certificate the process is as
follows:
Obtain the certificate. This might come bundled in some software provided by
the user or might be found on the users Web site.
Check the digital signature of the certificate issuer using the public key
associated with the certificate issuer; this can often be found in heavily used
software such as a browser.
If the previous step is successful employ the users public key found in the
certificate to decrypt data
577
Types of certificate
Internet Technologies
578
There are a number of different types of certificate. These basically contain the
same data. The only difference lies in the fact that extra data associated with the
particular use is included, for example the IP address of a server might be
included in a server certificate
578
Lecture 16
Security(ii)
Internet Technologies
579
579
Aims
To look at security on the Web
Examine the various types of viruses that can be
encountered.
Examine anti-virus tools
Look at the concept of a digital signature
Examine some architectural issues
Look at the effect that distributed applications
have on the development process
Internet Technologies
580
580
Digital signatures
Uniquely identify a person or organisation
that published a public key
Relies on message digest functions and
public key cryptography
Provides authentication
Needs knowledge of the message digest that
is being used
Internet Technologies
581
581
582
582
Internet Technologies
583
The next set of slides detail how the security technologies that have been detailed
in previous lectures and slides are employed in email. The major product I shall
examine is PGP since it is commonly available. However, the other technology
S/MIME will be standardised first
583
PGP
584
584
PGP services
Digital signature: DSS/SHA or RSA/SHA
Message encryption: CAST or IDEA or
Triple DES with Diffie Helman
Compression: Zip
Email compatibility: radix-64 conversion
Segmentation: implemented to
accommodate message size limitations
Internet Technologies
585
The slide shows all the facilities of PGP. The bulk sending of data employs a
standard symmetric encryption algorithm, for example IDEA, with public key
cryptography creating a single session, one-time key for this bulk transfer.
585
Authentication in PGP
Sender creates a email message
SHA-1 used to generate a hash code
Code is encrypted using RSA by using the
senders private key, this is placed before
the message
Receiver uses RSA public key to decrypt
the code
Receiver checks code compatibility
Internet Technologies
586
586
Confidentiality in PGP
Based on one-time keys used for encrypting
and decrypting a single message
Algorithms used include CAST, IDEA or
triple DES
One time keys are created using RSA
(Diffie Hellman is another option)
Internet Technologies
587
The bulk transfer of email messages is carried out using one of a number of
algorithms. Each message that is encoded has a unique key generated for it
known as a session key. This key is created using RSA.
587
Internet Technologies
588
588
Web security
A number of approaches depending on what
level the technology is used
IP/IPSec
SSL
Kerberos, PGP, SET, S/MIME
Internet Technologies
589
589
Internet is bi-directional
Highly visible output for corporate activity
Web server software is complex
Web server can be used as a launching pad
into a businesses network
Casual and untrained users employ the Web
Internet Technologies
590
There are a number of factors which make the Web insecure. First traffic goes
into a Web server as well as leaving it. Second web software is complex and can
hide serious security weaknesses
A Web server if infiltrated can be used as a launching pad whereby an attacker
infiltrates into a network. Finally many Web users are inexperienced and and are
not aware of security risks.
590
591
There are a number of threats which a web site can receive, Two of them are
shown above with the measures to counteract them in italics. In general these
measures work well.
591
592
The only type of threat that is very difficult to counteract is the denial of service
attack where, for example, a machine resource such as connection threads are
used to the point where genuine users cannot access the resource.
592
IPSec
RFC 1636
Security features issued for IPv6
Usable within current IP
Many vendors have some IPSec capability
in their products
It encrypts and authenticates traffic at the IP
level. Thus all distributed applications can
be made secure
Internet Technologies
593
This is a security standard that originated from the Internet Architecture Board. It
has the major advantage that is is based on the IP level in the layered Internet
architecture. Because all applications use this level they can all be secured in a
uniform way.
593
Benefits of IPSec
If implemented in a firewall or a router it
provides strong security for all traffic
passing through.
Makes a firewall stronger and makes it
virtually impossible to bypass.
Transparent to applications
Transparent to end users
Can be tailored to individual groups or users
Internet Technologies
594
There are major benefits to using IPSec. First, to both applications and users it is
transparent. Second, it also strengthens firewalls in that all outside traffic must
use IP by using HTTP) and the firewall is the only entrance to a corporate
network. Within a corporate environment the security provisions can hence be
relaxed.
594
Internet Technologies
595
You can also implement security in levels above the TCP/IP level.
595
SSL protocols
Record protocol
Change cipher spec protocol
Alert protocol
Handshake protocol
Internet Technologies
596
596
Internet Technologies
597
The client sends the server a number of items of data including the
clients SSL version number, the cipher settings for the client and some
randomly generated data.
The server responds with a burst of similar data and also sends its digital
certi<cate; if the interchange of data requires the client to provide a digital
certi<cate then it will ask for this item.
The client authenticates the server; if this fails the user of the client is
informed.
Using the data that has been generated in the handshake the client creates
an item of data known as the premaster secret. This is used later in the
handshake.
597
The server authenticates the client. This only happens if the transaction
requires both parties to be authenticated. SSL is capable of being used
when only the server is authenticated and so this step could be omitted,
and most of the time it is.
If the client and the server have been successfully authenticated then
both sides carry out the process of generating another item of data known
as the master secret; this item is partly generated from the premaster
secret. The master secret is a one-time 48 bit quantity that is used to
create the keys used in the bulk transfer of data between the client and the
server after the handshake has been completed.
At this point the client and the server generate a pair of keys from the
master secret. One key is used for encrypting and decrypting data from
the client to the server; the other key is used for encrypting and decrypting
data from the server to the client.
The handshake is complete and the client and the server can start
exchanging encrypted data employing one of the algorithms which are
built into the version of SSL that is used. Part of the handshake involves
the parties to the transfer of data deciding on which algorithm to use.
Once a session has been completed the connection is severed. If the two
parties wish to communicate again then they have to carry out the
handshake; each time that the handshake takes place a different pair of
encryption keys are generated and a different master secret generated.
598
Internet Technologies
599
599
SET
Secure electronic Transaction Standard
Developed by Visa, MasterCard, RSA,
Microsoft etc.
Set of security protocols and formats
1998 first wave of SET compliant products
became available
Internet Technologies
600
600
Aims of SET
Confidentiality of payment and ordering information
Integrity of transmitted data
Authentication that a user is alegitmte holder of a credit
card
Authentication that a merchant can accept credit card
orders
Best security practices
A protocol that does not depend on transport security
mechanisms or interferes with them
Independence from hardware and software
Internet Technologies
601
The bullet points above describe the various criteria that were specified when
developing SET. In general all of them have been satisfied.
601
Confidentiality of information
Integrity of data
Cardholder authentication
Merchant authentication
Internet Technologies
602
602
Firewalls
Extra layer of protection placed around a
network
often employs a router which can filter off
certain types of message
A number of configurations, I shall detail
only two
Internet Technologies
603
One major way of guarding against a number of forms of attack is to design the
topology of your network in such a way that it is difficult for intrusion to occur.
For example, it can be virtually impossible for a sniffer to be placed in a network
if it is highly compartmentalised. One of the most effective ways of using
network topology is by implementing a firewall.
A firewall is an extra layer of protection placed around a network or around a
particular application. A firewall placed around a network will usually employ a
router which can be programmed to deny access to a network, for example it can
be programmed to deny access to any packets of data which have been sent to a
particular dedicated port.
603
Internet Technologies
604
The configuration above is intended to protect a Web server which dispenses pages to
the public from being compromised and perhaps acting as a starting point for a more
serious intrusion which affects other computers in the internal network. The
configuration involves a programmable router which is able to monitor, re-route and
reject packets of data and a Web server known as a bastion host or a proxy server. The
bastion host acts as a temporary store or cache of pages which have been dispensed by
a real Web server which resides within a closed network.
When a packet of data is processed by the firewall router it will determine what to allow
through to the internal network that it protects. Often the data allowed through will be a
very small subset of the data which could be sent to it: for example, it might only allow
through data which represents e-mails. If the router detects data which is intended for the
Web server it will forward the data to the bastion host. Any other data is rejected.
When the bastion host receives data which accesses Web services it will satisfy that
service. It will first check that the pages required by the request are contained in its cache
of pages; if so, then it will send the pages to the computer that requested them. If the
pages are not contained in the cache then it will request the real Web server, which
resides within the firewall, to send it the pages so that it can satisfy the request.
The use of a bastion host secures Web services because any intruder has to compromise
this computer before they can enter the network in which the real server resides. For
example, a malicious attack on the bastion host which attempted to delete Web pages
would only delete the temporary cached pages.
604
A screened subnet
Internet Technologies
605
605
Viruses
Executable virus
Data virus
Polymorphic virus
Startup file virus
Device driver virus
Stealth virus
Internet Technologies
606
There are three main types of virus: executable viruses, data viruses and device
driver viruses. An executable virus is a virus which is attached to an executable
file which, when executed, will result in the virus code being run. This code will
then carry out some malicious act such as deleting important files. A data virus is
a virus which infects a file containing data, rather than executable code. Often
this data is associated with some program and which the program requires in
order to carry out its functions. For example, many programs require a startup file
which initialises the program and sets up basic parameters for its operation. A
data virus could infect such a file and set the data in it to values such that the
program will crash or its functions.
will be compromised; another type of data virus could add an entry to a password
file that allows access to an intruder. Another example is that of a data virus for a
word processor that can be easily written and which would corrupt every
document opened by the word processor or, even worse, delete every document.
A third class of virus is the device driver virus. This infects the device drivers of
an operating system which are then used to piggy-back into other parts of a
computer such as its file store. Happily this type of virus is usually associated
with older operating systems such as MSDOS.
There is also a further classification of viruses which categorise the ways that
they use to hide their presence on a computer. There are two types of virus which
are categorised in this way, the stealth virus and the polymorphic virus.
606
607
Anti-virus software works by scanning the file store of a computer looking for
known viruses or for changes in files, for example an operating system file
suddenly becoming larger, even though no update to the operating system has
taken place.
They are software tools which look for unusual changes in the files stored in a
computer and also look for file characteristics which are associated with known
viruses. Many of the tools allow the user to download a database of current virus
signatures; often these databases are only a matter of hours out of date so they
will catch most viruses.
607
Case study
Large financial institution
Screened host firewalls
Virus checkers both for network incoming
and physical incoming
Huge amount of non-computer security, for
example physical security, visitor security,
document security level specification
Internet Technologies
608
Lecture 17
Integration (i)
Internet Technologies
609
609
Requirements analysis
Requirements specification
Design
Coding
Validation
Implementation
Internet Technologies
610
610
The change
Requirements analysis
Requirements specification
Design
Radically
Coding
changed
Implementation
Internet Technologies
611
611
Aims
To describe four approaches to integration
To describe information oriented integration
To describe business process-oriented
integration
To describe portal-based integration.
To describe service-oriented integration
To introduce the role of some technologies
in integration
Internet Technologies
612
612
Integration
The process of bringing together chunks of
prewritten software.
Technologies that are involved include
XML, Web services, SOAP, Application
servers and protocols such as HTTP
Internet Technologies
613
613
614
Many of the activities associated with the conventional development cycle are
unchanged, for example the developer will always need to know what the system
requirements are going to be. However, other activities are modified or virtually
eliminated. Design just involves looking at performance and reliability since the
prewritten chunks have already been designed. Coding is reduced or even
eliminated because the chunks will have been prewritten.
614
Internet Technologies
615
The main technique used to join chunks together is messaging where an entity in
a distributed application communicates its need for a service or the provision of a
service using a message. System software known as message oriented
middleware (MOM) is employed for this.
Sometimes distributed objects are employed for joining the chunks with, say a
CORBA object being used to front-end a package.
615
Internet Technologies
616
616
617
Application servers are servers into which can be loaded reusable objects, for
example a warehouse object which could be used in a stocktaking application.
Such servers obviate the need to do any detailed programming to cope with
problems such as inconsistent updates or lost updates.
They are important for integration because the objects that are embedded in the
server are reusable and can be easily modified and moved from one application to
another. The modification usually requires very little change to the core code of
the object.
617
618
618
Internet Technologies
619
A framework is very much like the empty shell of a building. It contains all the
structural elements necessary to implement an application such as a purchasing
system. However, like the shell of an empty building it does not contain detailed
items that distinguish a specific application from another one.
Its importance with regard to integration is that a system can be developed in
terms of a framework and then instantiated for specific applications. It also
enables maintenance to be carried out efficiently.
619
Messaging
The main mechanism for communication
between chunks of an integrated system is
messaging.
Achieved via low level means such as
HTTP or via message-oriented middleware.
Design then becomes a way of tweaking the
messaging system so that response time is
maximised.
Internet Technologies
620
620
Integration servers
Servers which mediate between various
components of an integrated system.
Carries out processes such as merging data,
removing data and transforming data to other
formats.
More sophisticated servers are driven by business
process definitions.
BizTalk is the best example of an integration
server.
Internet Technologies
621
An integration server is a server that acts as a sort of hub between the individual
components of a system. These components will often be written in different
programming languages and will employ different technologies that use different
variants of languages. For example, there may be a number of different database
products used each employing a different variant of SQL. An integration server
sits in the middle of an integrated system and is programmed to coordinate the
various commands and messages that are exchanged from one components to
another. For example, an integration server will be able to intercept an SQL
command from one database system, decode it and change it into an SQL
command for another database system.
Very sophisticated integration servers can be programmed using some Business
Process Definition language. This effectively puts a business-oriented notation on
top of the code found in the integration server.
621
Scripting languages
Often need for glue code to join together
components of an integrated system. Usually this
is a relatively small amount of code.
Scripting languages such as Perl, Python and Ruby
are often used for this.
Ruby is becoming an increasingly popular
language due to its connection with Ruby on
Rails.
Internet Technologies
622
Sometimes integrated systems will require some new code to be written, for
example when a new function is to be implemented or some transformation that
cannot be carried out by an integration server cannot be implemented. Scripting
languages are interpreted languages often heavily oriented towards string
processing and are easy to debug. Perl has often been the language of choice for
this. However, newer languages such as Python and Ruby are coming to the fore.
Particularly Ruby which has a rapidly increasing following due to a package
known as Ruby on Rails used for web site development.
622
Internet Technologies
623
Scripting languages are used to provide the glue which brings together
components of an integrated system and also develop code required for any extra
functionality. JSP and ASP are similar in use. These are technologies that allow
program code to be interspersed with HTML with the program code providing
the functionality that the HTML displays. JSP (Java Server Pages) allows Java
programming code to be interspersed with HTML and allows virtually any
processing that can be used with Java as a standalone programming language.
623
Message-oriented middleware
Used as buffer between integrated
components
Very simple API
A number of mature products
Supports interrupted running
No reliance on any programming
technology
Internet Technologies
624
624
625
The main driver towards integration has been the Internet. Not only has it
spawned standards such as HTTP which every piece of Internet software has to
implement, but has also given rise to XML, a technology that can be used for
defining industry specific standards. If it wasnt for the standards features of the
Internet I would not be lecturing on integration here today.
Another important part of the drive to integration that the Internet has enabled is
the fact that by using message passing, large systems which are developed using
a variety of technologies can be connected even though these systems might be
many thousands of miles away.
625
626
626
Types of integration
Four types
Information-oriented
Business process-oriented
Service oriented
Portal-oriented
Internet Technologies
627
627
Internet Technologies
628
There are a number of ways of judging any of the approaches mentioned on the
previous slide: how mature are the technologies, business process type
integration does rely on the state of the art technology and there have been some
recent disasters. Another criterion is the degree to which each of the methods is
able to work with a wide variety of technologies not those bound to some
proprietary set of standards. A further criterion is how much human intervention
is required: for example portal-oriented integration gives rise to systems quickly
but relies on more human operator intervention when they are operating than
other forms of integration. A further criterion is the important one of judging how
much processing and communicational overheads are generated.
628
Portal-oriented integration
Views a number of systems via a single, usually
web-based interface.
Other forms of integration use technologies that
are real-time and user-driven. Portal-oriented
integration involves a human operator carrying out
the coordinating
Can be a very rough and ready approach involving
the development of quite a bit of glue software.
Internet Technologies
629
629
An example
SQL Server
based system
COBOL-based
flat file system
Browser
Internet-based
system
Portal server
Internet Technologies
630
The figure shows how a portal server brings together all the potentially disparate
components of an integration server. These components can be as disparate as the
three shown in the figure.
630
An example
Company wishes to implement a bulk
buying site.
Needs web access to both customers and
staff.
Needs access to the purchasing systems of
wholesalers.
Internet Technologies
631
631
Categories of portals
Internet Technologies
632
A single system portal is a portal that is situated within a single enterprise such
as a commercial company or a hospital. It just integrates all the disparate systems
which are found in each of the enterprises. It effectively takes all the interfaces
that are in existence and unties them into a eb interface.
A multiple-enterprise system portal is the most common portal. Here a number
of enterprise systems are connected via some server technology which allows
access from one enterprise system to the resources of another enterprise system.
The different systems that are integrated could be diverse and include SAP-based
systems, legacy systems based on technologies such as COBOL, systems based
on packages such as inventory packages and advanced web-based systems.
A trading community portal is one is one in which many companies are involved
in the integration with the portal bringing together all the various systems that are
maintained by the companies.
632
Mashing
A recent form of amateur portal development.
Here simple tools are sued to combine a number
of web sites which are accessed through a single
web site.
An example of this might be a web site which
used mapping technology to provide a search
facility for someone wanting to buy a house.
Internet Technologies
633
633
Information-oriented integration
Used for high data usage applications
Involves combining databases
The main processing is that of moving data
between components of an integrated system.
Wide variety of technologies can be employed:
integration servers, custom converters, database
replication software and special purpose code
developed using conventional or scripting
languages.
Internet Technologies
634
634
Architecture
Databases
Transfer medium
Databases
Internet Technologies
635
This figure shows the essential architecture of a system which has been
developed using an information-oriented approach to integration. Here a central
medium employs one or more technologies to transfer data from one database to
another. These databases will usually be different in scope or different in terms of
manufacturer.
635
636
One of the most important tools used for information-oriented integration is the
data replication utility. This takes a tabular description of a set of databases and
carries out replication and writing according to some script embedded in the
table. This will determine:
What databases are to be replicated from
Which databases are going to receive the data
What parts of a database are going to be copied.
Which modifications are to be carried out on the data.
When is the replication going to occur.
What conditions must hold for a replication to occur.
What monitoring information is produced.
636
637
This slide shows the steps that are needed in order to carry out this form of
integration:
First, you will need to understand the structure of the data that is stored in the
databases that will eventually make up the eventual system. Database schemas
are the normal place to look for this.
Once the functionality of the new system has been determined it is necessary to
select the data that is to be transferred.
The frequency when data is to be transferred needs to be determined; it could be
seconds or it could be a daily update, all this depends on the application.
If any reformatting of the data is required, for example a customer designator
needs some extra characters, then this needs to be determined.
The technologies used for the transfer then need to be chosen. Normal criteria
such as overhead, communication time, cost etc. would be used here.
Finally the technologies that are employed to carry out the transfer are deployed,
this might involve some programming in something like a scripting language or
the creation of tables such as data replication tables.
637
An important point
Internet Technologies
638
638
Service-oriented integration
Based on application services
Connects together individual software
nodes that offer a number of services.
Has come to the fore with the rise of web
service technology.
Some application services are public, for
example those associated with Amazon,
Ebay and Google
Internet Technologies
639
639
Internet Technologies
640
As well as services being developed for proprietary systems there are a number of
application services that are associated with public web sites such as eBay and
Google. Such sites offer an API, a set of method or subroutine calls which invoke
the functions of the site.
640
Internet Technologies
641
There are three major reasons for developing a system using service oriented
integration.
Where some functionality is to be accessed where the functionality might
change over time.
Where there is a need to share development costs of a project and where each of
the companies involved want a clean interface to the functionality of some of the
components.
Where the problem area is small and a common application is to be developed
which companies want to share.
641
Architecture
What the programmer sees
Interface layer
Internal systems
Internet Technologies
642
642
643
643
A business process
Some action expressed in business terms
rather than technical terms.
A variety of languages have been defined
Languages include BPEL, XLANG, WSFL
and BPML
Often supported by both a textual notation
and a graphical notation
Internet Technologies
644
Business processes are a series of steps which give rise to some business result
such as a credit card being validated. They are expressed in business terms and
hence would use a vocabulary consisting of words such as invoice, bill, picking
list and account.
A language for expressing business processes is usually very simple and consists
of a set of control structures together with some modularisation facility such as a
subroutine. There are a number of languages in existence; however, there are
major sign of a shake down with the winner becoming BPEL.
644
Internet Technologies
645
645
Internet Technologies
646
This is a simple example of a business process definition, one which is the frontend of a purchasing process. As you can see it is expressed in programming terms
but lacks the detail associated with a programming language.
646
Architecture
Components of the integrated system
BP middleware
Business processes
Internet Technologies
647
The schematic shows the role of the Business process middleware. It consults the
business process definitions and then coordinates the action of the program code
of the components of the integrated system. The next lecture will describe the
technologies used to implement the BP middleware.
647
Lecture 18
Integration (ii)
Internet Technologies
648
648
Aims
To outline some middleware models
To describe some technologies that are used
to implement the integration systems
software layer.
To examine in a little detail the role of
integration servers.
To look at BizTalk Server as an example of
an integration server.
Internet Technologies
649
649
Integration models
Internet Technologies
650
There are three main models used for integration architectures. The point-to-point
model connects every node with all other nodes. The central hub model employs
some technology as a central marshalling point. This technology is usually some
form of integration server. The integration hub is where a bus is used to send
messages and transactions and where individual components of the integrated
system subscribe to certain types of message and certain types of transactions.
650
Point-to-point architecture
Point-to-point
All connections
Internet Technologies
651
651
Internet Technologies
652
652
Integration hub
Central bus with components subscribing
Internet Technologies
653
Here a central bus is used for communication between the individual components
of the system. The components subscribe to messages that are of interest to the,
for example a component carrying out database processing might subscribe to all
those transactions that affect the databases that it holds. Other components
publish messages and transactions to the bus.
653
654
One of the most venerable technologies used for integration is that of Remote
Procedure Call (RPC). Here a program on one computer in a distributed system
calls part of another program on another computer. This is one of the first
technologies used for inter-computer communication and is over twenty years
old. It has the major advantage that it is relatively easy to program, all the
programmer has to do is call some code as if it was resident on their local
machine.
In order for RPC to function there is a need to define the interface offered by one
computer. This definition should specify the individual subroutines or methods
available, the arguments that need to be employed and the results that are
returned. In the latest version of RPC (XML-RPC) this is done via an XMLdefined language.
654
XML-RPC
Most modern version of RPC
Based on standard Internet protocols
Uses XML to define the interface between
entities using RPC.
Becoming a competitor to web services
Internet Technologies
655
XML-RPC is a technology that is quite young, around five years old. It enables
programs written in a variety of languages to communicate by sending messages
which invoke program code on a remote computer. It employs standard Internet
protocols such as HTTP.
655
An example
POST /rpchandler HTTP/1.1
User-agent: DarrelXMLRPC/1.1
Host: XMServer.book.com
Content-Type: text/xml
Content-Length:238
<?xml version = 1.0?>
<methodCall>
<methodName>
getStaffNames
</methodName>
<params>
<param>
<value><string>Part-time</string></value>
</param>
<param>
<value><string>WeeklyPaid</string></value>
</param>
</params>
</methodCall>
Internet Technologies
656
656
657
A client program makes a software call which calls the XML-RPC library code. The program
specifies the name of the code to be executed, the arguments and the address of the server
which is XML_RPC compliant.
The XML-RPC software on the client packages up the request and converts it into the
HTTP/XML form detailed above and issues a POST request to the server specified in the first
step. The client stops at this point waiting for a response from the server to which the POST
has been sent.
The server receives the POST command, extracts the XML payload out and passes the
content to the XML-RPC software.
The XML-RPC software on the server parses the payload and determines what code is to be
executed and what the arguments are.
The code that has been identified in the previous step is then executed with the arguments
that have been identified.
The XML-RPC software on the server monitors the result of the execution of the code
detailed in the previous step and constructs a HTTP response which contains the XML that
represents the result of the execution.
The client uses the XML-RPC software to unpack the response and extract out the result of
the remote execution at the server.
Finally the client will take the data returned from the server and restart its execution using
this data.
657
A response
<?xml version = 1.1 ?>
<methodResponse>
<params>
<param>
<value>Out-of-Stock</value>
</param>
</params>
</methodResponse>
Internet Technologies
658
Here a response is shown. It is expressed in an XML form and show that a single
data item having the value Out-of-Stock is returned. It is this value that will be
processed by the client.
658
Internet Technologies
659
This is some example code showing how a client might make an XML-RPC call.
Here the computer identified by its IP address 101.125.100.45:4566 is sent a
command to execute the method mess in the class FindRouter with the
arguments found in the Vector object named arguments.
659
BizTalk Server
Microsoft product
True integration server
Recent incarnation uses business rules and
business process language
Intended for a central hub role
Buit on top of the .Net architecture
Internet Technologies
660
660
Features
Sets of adapters to manage the interaction between various
components employing a wide variety of protocols.
Contains a receive pipeline connected via an adapter to a
receive port
Contains a send pipeline which connects with the outside
world via an adapter and a send port.
At the heart of the server there is a message box containing
messages in transit.
A business rule repository carries out orchestration.
Internet Technologies
661
The BizTalk MessageBox is a repository for messages.. Each message is associated with detail
such as where the message originates from.
Incoming messages arrive at a location known as a receive location. A listener is a component
that monitors the URL of a receive location and introduces the message into the BizTalk server.
Before the message is deposited in the BizTalk MessageBox it may traverse a receive pipeline.
This pipeline might contain processing elements that would transform a message in some way
before depositing it in the MessageBox. For example, the message might be encrypted and a
component of the pipeline would then be responsible for its decryption.
There are a number of send ports associated with a BizTalk server. These ports will have
subscribed to a particular message and will receive those messages that they have subscribed to.
A send pipeline is used to process messages before they are delivered to some component of the
integrated system. As with the receive pipeline the send pipeline will contain components that
could transform the message being sent out; for example adding digital certificate information.
An orchestration component will manage the process of messages being send, transformed and
being delivered. For example this component might be used when a message is received from a
supplier that a particular quantity of a product has been delivered to a warehouse. It would
orchestrate the process of receiving the message, delivering the basic data of the delivery to a
database used by the accounting part of a company and delivering data to any companies that
have made orders for the item being delivered that it is now available.
661
Architecture schemtic
Receive port
Adapter
Send port
Business
rules
Receive pipeline
Adapter
Send pipeline
Configuration database
Tracking database
Message box
Internet Technologies
662
Here you see the architecture of BizTalk Server. Messages come in via receive
ports and are initially processed by an adapter. They can then be transformed by a
pipeline known as the Receive pipeline. They are then stored in an SQL-server
implemented message box before being transformed and sent to other
components of the integrated system. This is via a send pipeline and an
associated adapter. The processes of transformation and routing are determined
by business rules which are stored centrally.
662
BizTalk tools
Editor
Mapper
Pipeline designer
Orchestration designer (uses Xlang)
BizTalk explorer
Internet Technologies
663
There are a large number of tools associated with BizTalk Server. The slide show
five of the most important.
The editor defines the format of messages that will be processed by BizTalk and
allows the developer to check whether specific messages meet the specification.
The mapper is a tool which describes how one message can be mapped into
another message. The mapper employs code snippets known as functoids to do
this.
The pipeline designer allows the developer to specify the connections between
the individual transformation elements in a pipeline.
The orchestration designer It employs the process language Xlang to do this.
BizTalk explorer is a tool which enables the developer to view entities such as
orchestrations, message formats and transformations.
663
Message-oriented middleware
Another major technology used for
connecting components in an integrated
system.
Simple APIs
Number of products on the market
Product with the most penetration is
WebSphere MQ
Internet Technologies
664
664
MOM schematic
Components
Components
Message queues
Internet Technologies
665
665
666
666
WebSphereMQ
Hugely popular technology
Variety of platforms: z/OS. UNIX, LINUX,
Windows
Important APIs supported include: Java, .Net, C,
COBOL, PL/1, JMS for Java, CMS for C/C++
Also a number of unsupported APIS including one
for the scripting language Perl.
Winner of two major product prizes in 2004.
Internet Technologies
667
667
Internet Technologies
668
There are a number of features of the product which make it robust and scaleable.
First id guarantees that messages will be delivered once, and once only.
It will deliver messages even though a receiving application might not be
running. The message will be stored until the application restarts.
It provides a primitive means of transforming data as it passes from one
component of an integrated system to another. This is achieved through the use of
message data "Exits". These are compiled applications which run on the queue
manager host; they are executed by the WebSphere MQ software at the time data
transformation is needed.
Finally WebSphere MQ has the capability to trigger applications when a
predefined messages arrive
668
669
This slide details the various components the key part of WebSphere: the queue
manager. This is the central coordinating part of the system and carries out all the
important functions apart from those involved in the transport of data. It
maintains two types of connection: Bindings connections and Client connections.
The former are faster than the latter; however, the latter allow for a much more
robust design which can be maintained more easily.
669
Assured delivery
Excellent development facilities
End to end security
Interface with web services
Ability to cluster
Time independent processing
Large number of systems that can be integrated
Support for growth
Internet Technologies
670
These are the main commercial claims for this messaging technology:
The fact that it assures delivery using an at least at most once model.
That it has an excellent set of tools for producing systems using WebSphere
MQ, both tools for code development and tools for administering the queues.
That security is provided by SSL.
That is can now interface with Web services defined by SOAP
That MQ processing can be distributed across a number of processors.
That it can be used to integrate a wide variety of systems written using large
number of technologies
That it can handle messages where the recipient is not executing, for example a
program resident on a portable computer which has gone out of WiFi contact.
That the simple messaging structure allows new applications to be easily added
to an existing system that uses WebSphere MQ.
670
Internet Technologies
671
One of the main enablers for integration is the availability of standards. If the
components of an integrated system can communicate with a shared semantics
then much of the work involved in integration will be substantially reduced.
Many of the standards that are available have been fuelled by XML. Standards
are industry specific such as 1YNC or system level standards such as SOAP or
HTTP.
Often, certainly for the simpler standards a hub and spoke model is used for
communication as for example with 1SYNC.
671
1SYNC
American standard
Based on the codification of retail products.
Employs a hub and spoke model with the hub
containing a central database of product
information such as the name of the product, its
unique key and its dimensions.
Used by retailers when they are ordering products
from a supplier, for example to estimate
warehouse space.
Internet Technologies
672
1SYNC is a simple but highly effective standard for describing retail goods. A
central database keeps about 70 product attributes live and allows subscribers to
the 1SYNC system to add new products, modify products and be notified when
there is a change to a product.
This standard is unusual in that it does not just encompass a written description of
what product information should look like but also contains a technological
infrastructure which enables subscribers to access a database.
Such a database removes many of the transformation and customisation problems
that are associated with supply chain applications.
672
OpenDocument
A standard for memos, reports, spreadsheets etc,
any document that can be generated in an office.
Based on XML
Supported by OpenOffice and KOffice
In competition with Microsoft's Open Office
XML.
Recently Microsoft have announced that they will
develop a plug-in to save to ODF
Expected to become an ISO standard
Internet Technologies
673
673
FIXML
Standard for exchanging messages about
financial trading.
Mainly used for securities trading
Implements the FIX protocol which was
based on a comma-separated format
Some criticism of FIXML in that it is seen
as lengthening the life of a not too useful
protocol.
Internet Technologies
674
674
FIXML
Internet Technologies
675
675
WikiPing
Used to broadcast messages about changes
to a Wiki
Open standard
Idea based on the ping utility used by
Internet users
Provides information about the nature of a
change
Internet Technologies
676
676
SMIL
Defined yet again using XML.
Synchronised Multimedia Integration
Language
Defines the properties of a multimedia
entity such as a video clip
Open standard defined by the W3C
MMS based on SMIL for handheld devices
Internet Technologies
677
677
RSS a reprise
678
Yahoo Pipes
Mashing technology
Relies on a simple graphic programming
language
Combines RSS feeds
Major debate about its power
First steps towards more powerful facilities
Internet Technologies
679
An example
Internet Technologies
680
This is an example of a very simple pipe. All it does is to fetch an RSS feed and
replace items within the feed with other items.
680
Scratch
681
Lecture 19
Integration (iii)
Internet Technologies
682
682
Aims
To outline some of the roles of an
integration server
To examine BizTalk in more detail
To introduce the concept of business
process analysis
To introduce ebXML
Internet Technologies
683
683
Integration services
Message-based
Employs standard-based messages, pseudo
standards and application specific standards
However, these standards are usually
hidden by an extra system layer.
Employed in a hub and spoke architecture
Internet Technologies
684
Integration services are almost invariably message based, they employ a host of
protocols ranging from standards-based protocols such as HTTP to pseudo
standards such as MQSeries and application standards such as BAPI used for
SAP. A good integration server should hide all the details of these protocols and
enable the developer to employs tools which do not require knowledge of the
protocols. Integration servers are employed in a hub and spoke architecture where
they marshal, coordinate and orchestrate the various components of the system.
684
Transformation
Schema conversion
Data conversion
Routing
Internet Technologies
685
The slide above details the first four functions of an integration server.
Transformation involves the transformation of data that passes through the hub
that is implemented by the server, for example removing a first name and
replacing it with an initial. Schema conversion involves the transformation of a
database schema into a form that it can be used by another component of the
integrated system. Data conversion is the process of carrying out some
transformation so that data which is one form can be processed in another form
by another component of the integrated system, for example replacing a dollar
amount by a pound or Euro amount. Finally routing involves the determination of
the destinations of some message or business transaction and sending it.
685
Rules processing
Message warehousing
Repository maintenance
Directory services
Internet Technologies
686
686
Architecture
Business rules
Business processes
Internet Technologies
687
687
State management
Transaction management
Correlation
Security
Internet Technologies
688
There are a number of problems that a good integration server needs to solve.
The first is state management. State is usually shared between the various parts of
a business process. This is analogous to the problem of maintaining state within a
web server. This is usually solved by having a centralised state container.
Transaction management is also a problem where a transaction which may
involve a number of concurrent updates needs to complete with out problems
such as the lost update occurring. Normally transaction management is solved by
employing techniques that are sued by application servers and which are
discussed earlier in this lecture course.
A third problem is correlation, this problem involves a number of instances of a
business process being in execution and a message arrive for one of them. Which
instance should the message be delivered to? There are two solutions: the first is
to use unique identifiers on messages; a second approach associates a message
with a unique business process instance.
Security is the final problem and is handled via standard security techniques such
as employing SSL.
688
BizTalk (revision)
Main Microsoft product
Based on messaging
Central message store known as the BizTalk
MessageBox
The heart of this server is an SQL Server
database
Heavy use of XML
Internet Technologies
689
Probably the best known integration server is BizTalk Server. This is quite a
mature Microsoft product which is based on messaging where messages are
stored in an SQL Server database known as the MessageBox.
XML plays a very important part in this technology in that all messages are
converted to an XML format. This server is normally employed in a hub and
spoke archtiecture
689
Incoming message
Receive locations
Receive port
Receive pipeline
MessageBox
Send pipeline
Send port
Outgoing message
Internet Technologies
Direction of
travel
690
690
Internet Technologies
691
691
Some vocabulary
Process definition: the definition of the whole of a
business process
Process instance: one instantiation of a process for
some specified data
Activity: a step in a process, for example check
credit card
Automated activity: an activity carried out by
some computer
Manual activity: an activity carried out by some
human operator
Internet Technologies
692
693
Havey in his excellent book Essential Business Process Modelling has described
a number of reasons for carrying out business process modelling:
It enables an enterprise to formalise its processes; this means, for example, that
process descriptions can be given to new staff who can execute these processes
with little training.
It can lead to the automation of a number of the activities, this is the rationale
detailed in this lecture.
There have been a number of studies which show major savings in both staff
numbers and response time when an effective process modelling exercise has
been carried out.
By delegating easy processes to computers it means that those processes which
require deeper skills can be delegated to human operators.
It enables a company to easily discover whether they are meeting external
regulations and, moreover, it enables that company to demonstrate to auditing
staff that it is compliant.
693
694
This quote represents the rationale for including business process modelling
(BPM) within this part of the course. Many applications that have been integrated
have suffered from problems that have occurred because it was done in a
piecemeal way with little if any consideration of the overall enterprise
architecture.
694
BPEL
BPML
Web services choreography
BPM
BPSS
Internet Technologies
695
There are a number of standards for business process modelling, the most
important are shown above.
695
BPEL
Introduction to BPEL
Internet Technologies
696
BPEL is far and away the most popular standard as it has the backing of four very
powerful organisations: BEA, Microsoft, IBM and Oracle. It also has the
advantage that it closely allied to web service technology, for example it is
relatively easy to turn a process specification expressed in BPEL into XML code
which describes a web service that reflects the process. You will sometimes see a
reference to BPEL4WS and BPELJ. The former is an alternative acronym while
the latter is an extension of BPEL that provides a smooth progression to the Java
code that implements a process.
696
BPML
Business Process Modelling Language
XML based definitions moderately similar
to BPEL
Allied with a graphical modelling notation
BPMN
Can be mapped to BPEL
Internet Technologies
697
This is a notation that emerged from the Business Process Modelling Initiative
organisation. It, like BPEL is based on XML and includes an interface with a
very sophisticated modelling notation known as the Business Process Modelling
Notation (BPMN). In standards terms it lags well behind BPEL.
697
Choreography
Web-service oriented
Describes how web services should work
with each other for multiple participants
Can be used for large-scale specification
Language WS-CDL
Internet Technologies
698
698
699
This is a reference model that has been developed by the Workflow Management
Coalition. It is based on a central enactment service which interfaces with other
components: administration and monitoring tools, workflow client applications,
normal applications, process definition tools and other workflow enactment
services.
699
700
The list above, taken from Havey, are the main components of a good BPM
architecture. There should be some infrastructure which choreographs the various
components of the process, you should be able to export and import a number of
notations, there should be some tool which creates human work lists for tasks that
are not automatable; there should be the internal code which is executed when a
process is executed, a runtime engine which carries out the coordination of tasks,
a console which allows staff to monitor the business processes that are being
executed, a graphical editor which creates business process descriptions and code
generators which convert business process definitions into working program
code.
700
ebXML purpose
Describe business process and specific interfaces?
Sharing of business process with other
enterprises?
Discovering which business processes a cocompany supports?
Description of the business messages for a
particular transaction?
Description of the security policy and technical
configuration employed to implement business
processes?
Internet Technologies
701
ebXMl is a popular notation used for business process description. The slide
above describes its main functions.
701
702
The core of any ebXML implementation is the ebXML registry. The registry
information model that is used is hierarchic and is organised on a sector basis
(see next slide). When co-companies interact with each other it is this registry
that is accessed: it forms the main interface between business entities.
702
Retail
European retail
IT
US retail
ASDA
Internet Technologies
703
This shows the hierarchical arrangement of the registry. The first three levels
represent what are known as classifications and contain common entities that are
associated with the classification, for example generic business processes
associated with the retail sector. At the bottom of the hierarchy there is actual
entities; only one is shown in the slide, that of ASDA.
703
Implementation phase
Discovery and retrieval phase
Run time phase
Internet Technologies
704
When two companies want to develop some trading arrangement using ebXML
there are three phases that they have to carry out. The first is the implementation
phase. In this the partners will analyse their business processes and publish them
to the registry. After this an ebXML implementation must be carried out with the
business processes between the co-companies being linked together via some
ebXML framework.
The next phase occurs when the co-companies start working together. Each
company will discover business process information from the other company as a
prelude to the actual transfer of business data and the execution of the code that
implements the business processes.
The final phase is where business transactions and messages are exchanged
between the two companies which carries out some composite business process.
704
Lecture 20
Integration (iv)
Internet Technologies
705
705
Aims
Introduces BPEL
Looks at BPEL as a business process
modelling language
Describes the elements of BPEL
Examines the interface between BPEL and
web services
Examines to final standard-based case
studies.
Internet Technologies
706
BPEL History
707
BPEL is now the most popular business process language. It was developed by
BEA, IBM and Microsoft before being handed over to the OASIS organisation
for standardisation. Standards are either XML-based or notational. When a
standard is notational it is based on a notation, usually graphical which is nonXML for example a notation based on the software engineering notation UML.
BPEL is firmly based on XML, however, there are a number of convertors
available which can transform BPEL into other notations.
707
WSDL
Process
engine
Purchaser
Receive order
Invoke purchase
Invoke sending
Invoke confirmation
Internet Technologies
708
The diagram above shows the architecture of a system which receives orders
from a customer for some product which is then shipped to the customers, for
example an online store. The BPEL part of the architecture contains details of the
business processes, while the Web Services Definition Language (WSDL) part
details how the process is to interact with a web service implementation. The
diagram show one interaction: with the purchase, other interactions, for example
that with a shipping company are not shown.
708
Internet Technologies
709
709
Process
Variable
Partner link
Compensation handler
Internet Technologies
710
BPEL contains a number of different objects; this slide and the following slide
details some of them.
A process is the specification of a business process.
A variable is akin to a programming language variable in that it holds data that
is used by a particular business process, for example a variable might hold the
identity of a product that is ordered by a customer from an online store.
A partner link is a specification of the role that a process has, for example a
process might be a buyer of goods and also resells those goods on to another
entity.
A compensation handler is a specification of cancellation logic which is invoked
if a particular transaction needs to be returned to a previous state, for example by
a customer cancelling part of a shopping cart.
710
Correlation set
Receive
Invoke
Sequence
While
Internet Technologies
711
711
Internet Technologies
712
Here is the first example of some BPEL. Notice that it is expressed in an XML
format. Here it defines what the role of a partner company is: that of a customer,
and the role that the receiver has: that of the provider of some goods that os
required by the customer.
712
<correlationSets>
<correlationSet name = AccSet
properties idprop amountprop />
</correlationSets>
Internet Technologies
713
This piece of BPEL defines a correlation set named AccSet. Any number of
correlation sets can be specified within such a nested element. The correlation set
has properties (variables) idprop and amountprop. The correlation set can then be
sued as the data interface between a receive operation and an invoke operation;
the former receives a request for a service while the latter invokes a service from
another provider.
713
Internet Technologies
714
Here the correlation set is assigned the values of the variables associated with the
receive that has been received from some partner. The initiate attribute is set to
yes to indicate that the step is the first in a business process.
714
Internet Technologies
715
Here the process invokes another service using the data held in the correlation
set. This sends data back to the entity that started the receive detailed on the
previous slide.
715
Internet Technologies
716
Here a partner is defined. This is some enterprise that a user of BPEL interacts
with. The example shows that this partner has two roles: that of a consultancy
company and also a builder.
716
Internet Technologies
717
This BPEL defines the fact that there is a partner link type known as a Reseller
and that there are two roles: that of a receiver and that of someone who receives
goods and then sells them on. The code also maps a role to the web services port
that is involved in the reseller interactions.
717
Internet Technologies
718
BPEL also contains a facility for error processing. The code above shows the
code that catches a number of errors, for example the fault that is generated when
an invalid product code is encountered. The ellipsis shown indicates the position
of the code that is executed when the fault is found. Faults are often generated by
means of the throw statement an example of this is
<throw faultName = invalidCustomer>
718
719
This is an example of the type of programming that can be carried out in BPEL.
Here the variable loopCounter is set to 22.
719
Internet Technologies
720
This just shows the skeleton for a while loop. This is the only looping structure
currently found in BPEL. This fact makes it quite a difficult language to program
in as for loops can be really difficult to simulate.
720
BPEL summary
XML-based business process language
In standardisation
Whole implementation involves WSDL
specification together with the BPEL
specification
Java version of BPEL known as BPELJ
BPELJ can either be Java with XML
embedded or XML with Java embedded
Internet Technologies
721
BPEL is the current popular business process definition language that is currently
undergoing the standardisation process via the OASIS group. A full specification
involves both BPEL and WSDL the Web Services Description Language. This
means that business processes can be directly hooked up to a set of web services
which implement the business processes. As well as there being an XML version
of BPEL there is also a specified version of BPEL which is Java-oriented. This is
in its early days with only two rudimentary proprietary implementations
available.
721
722
The final part of this lecture are devoted to two standards which have links to
business process specification. Standards are vitally important in systems
integration because they enable components to be integrated without any special
processing, for example by employing Perl or Python. The two standards that I
examine in this part of the course are eTom which is a telecoms industry standard
and RosettaNet. The former is a standard for supply chain integration within the
high-tech sector while the latter is a standard for the telecoms sector.
722
eTOM
Standard gaining major acceptance in
telecoms industry
Split into three areas:
Strategy infrastructure and product
Operations
Enterprise management
Internet Technologies
723
723
Internet Technologies
724
This part of the standard deals with the management of selling and marketing, for
example it would deal with the development of product brochures. It deals with
service development and management, for example the development of a helpdesk service. It deals with resource development and management, for example
the deployment of teams to produce some telecoms product. It deals with supply
chain management, for example the sourcing of components for some telecoms
hardware.
724
eTOM Operations
Internet Technologies
725
This slide shows the parts of the eTOM standard which deals with operations. It
deals with customer relationship management, for example the process of
keeping product technical information up to date. It deals with service
management and operations, for example running a hardware service department.
It deals with resource management and operations, for example the allocation of
staff to one-off projects. It deals with supplier/partner management, for example
the processing of defect data and the ascribing defects to bought in items.
725
726
This details the items which come under the eTOM enterprise management
banner. For example strategic and enterprise planning would, for example cover
the process of determining new markets or new products. Knowledge and
research management would cover tasks such as the development of horizon
documents which detail important blue sky areas to be investigated. Human
resource management would cover activities such as employee customer
professional updating programmes.
726
eTOM strengths
Covers both technical and non-technical areas
Is the only player in the sector
Enables business requirements analysis and
specification to be closely linked with technical
processes
It handles the problem of different processes
operating on different times and life-cycles
It integrates technical areas: application,
computing and network
Internet Technologies
727
These are some of the strengths of eTOM. The major strength of this standard is
the fact that it covers all the activities that a telecoms business needs to carry out,
for example it integrates business processes such as marketing with product
development and integrates technical areas which have in the past have existed as
isolated islands in telecoms providers, for example it integrates the process of
software development with network development and hardware development.
727
RosettaNet
Developed to standardise e-business in the
high-tech industries
Was developed surprisingly easily
Was developed as a response to the failure
of Electronic Data Interchange (EDI) in the
high-tech sector
XML based
Relies on PIPs (Partner Interface Processes)
Internet Technologies
728
728
RosettaNet Processes
Business process modelling
PIP
Dictionaries
Internet Technologies
Implementation
framework
729
The process of delivering a PIP is shown below. The first process is that of
understanding the business model uses and the various processes that make up
the model. This is shown as the top two boxes.
What is also needed is the development of a set of dictionaries that contain both
technical and business properties. The former contains the technical details of a
product, for example the details of a computer processor. The latter describes
business data relevant to partner companies.
The implementation framework consists of documents, for example XML DTDs
which define the formats used to exchange data and documents.
729
A PIP
XML documents
Class and sequence diagrams
A validation tool
An implementation guide
Internet Technologies
730
730
731
PIPs are partitioned into a number of different clusters, for example there are
clusters for partner, product and service review; for product introduction, for
order management; for inventory management; for marketing information
management and for service and support. The slide show the various clusters
associated with inventory management, probably the simplest of the clusters.
731