Академический Документы
Профессиональный Документы
Культура Документы
Introduction
Course Overview
Main focus: Internet technologies, protocols, and applications Secondary focus: Performance issues Textbook: William Stallings, High Speed Networks and Internet, Pearson Education, 4th, 2011 + ( Research Papers literature)
Goal: Understanding the current trend in high speed networking research field
Course Objectives :
To introduce principles and current technologies of High Speed Networks To develop an in-depth understanding, in terms of protocols and applications of major high-speed networking technologies Perform network design using the technologies to meet a given set of requirements
We know
What is network ?
Network Devices
Introduction
Protocol ??
CERN Project
Higgs boson
Large computer centre containing very powerful data-processing facilities primarily for experimental data analysis
9
Physicists were excited about the near discovery of the elusive God particle or Higgs boson. Dozens of Indian scientists have been involved over the years searching for this missing cornerstone of particle physics in laboratories in Europe, India and the US. Researchers from Hyderabad played a crucial role in this near discovery by patiently searching a wide range of giga-electron volts (GeV) to find the Higgs boson, named after British scientist Peter Higgs and Indian physicist Satyendra Nath Bose.
Dr Bose, who taught at Dhaka and Calcutta universities, did pioneering research in mathematical physics and quantum mechanics. Although he did not win the Nobel, at least two scientists who carried forward his work, won the prize. Even the Large Hadron Collider (LHC) at the European Organisation for Nuclear Research (CERN) has a huge contribution from India. The 8,000-tonne magnet at LHC was made in India. Indian teams also contributed to LHC hardware in the form of circuits and software in analysing computer-generated data. Incidentally, Indians have been associated with CERN for more than half a century, much before the LHC was fired up.
10
CERN Cont..
roughly 15 petabytes (15 million gigabytes) of data annually enough to fill more than 1.7 million dual-layer DVDs a year
Around the world scientists want to access and analyse this data Major wide area networking hub. CERN is collaborating with institutions in 34 different countries to operate a distributed computing and data storage infrastructure: the Worldwide LHC Computing Grid
11
12
CERN Cont..
Outcome ? birthplace of the World Wide Web (www) Higher network : Grid computing Higgs boson CERN has become a centre for the development of grid computing, hosting, among others, the Enabling Grids for E-sciencE (EGEE) and LHC Computing Grid projects.
13
CERN Cont..
Video
14
Need to carry large volumes of traffic with different quality of service requirements over network operating at very high data rate
Video, Web, FTP TCP/IP, UDP Unix VAX, Alpha Adapters, NICs FDDI, GigE, ATM Fiber, SONET, WiFi
Faster media does not necessarily imply faster network applications Interdependence between layers Interactions between protocols Need to consider trends of all layers
17
Networking
Growth of number & power of computers is driving need for inter connection Rapid integration of voice , data , image & video technologie s two broad categories of communications networks:
18
19
20
21
22
23
24
router server
local ISP
Communication links
company network
25
26
27
Internet Evolution
28
A network that uses both digital transmission and digital switching. Need to provide economic voice communication early 60s, answer to growth of digital, computer-controlled, circuit-switched networking Western Electic 4ESS introduced in 1976, 1st large scale 29 commercial time-division switch
IDN
Cont
integrated voice and data on the same digital transmission links/exchanges designed to allow digital transmission of voice and data over ordinary telephone copper wires resulting in potentially better voice quality than an analog phone can provide. It offers circuit-switched connections (for either voice or data), and packet-switched connections (for data), in increments of 64 kilobit/s.
33
ISDN
Frame Relay
popularized standard (c. 1988) for packet switching over ISDN Streamlined form of packet switching suitable for use in high speed networks (2 Mbps) most widely deployed WAN technology in use today
c. 1988: emerging demand for broadband services new high-speed technologies available ranged from about 1.5 to 2 Mbit/s. 35
early 90s outgrowth of emerging need for high-speed switching over B-ISDN WAN rapidly evolved as high-speed packet switching technology of its own accord ATM is a broadband multiplexing scheme which allows the multiplexing of widely differing types of digital signals into a common digital stream. primary deployment today is:
36
Scale
Application
demand for large to huge file transfers increasing critical nature of Internet use demand for real-time performance characteristics demand for guarantees of service levels
growing number of hosts - growing demands on bandwidth new technologies result in new paradigms for device and connection types
e.g. ??
High-speed LANs
driven by explosive growth in speed and computing power of PCs in 1990s emergence of client-server computing architecture in business environment . use of centralized server farms emergence of power workgroups and workgroup applications need for local high-speed LAN backbones
38
Traffic type
Elastic traffic: adjust its throughput and delay between end hosts in response to network condition. Generally TCP-based application (HTTP,STMP,FTP) Principle form of feedback: packet loss caused by network load/congestion, causing TCP to implements its congestion avoidance algorithm and reduce the rate at which packets are sent over the network TCP traffic is considered to be "network friendly Inelastic traffic: - does not easily adapt /adjust its throughput and delay in response to network conditions - generally real-time multimedia (audio streaming, video,VoIP)
39
Traffic type
Requirement
State requirement in advance Using service request function On fly IP packet header field
40
Qos on Internet
Requirements for inelastic traffic includes :
Throughput: average rate of successful message delivery over a communication channel. Delay: The delay of a network specifies how long it takes for a bit of data to travel across the network from one node or endpoint to another. It is typically measured in multiples or fractions of seconds. Delay variation : allowable delay Packet loss: Packet loss is the failure of one or more transmitted packets to arrive at their destination.
41
B Propagation delay (dprop) Transmission delay (dtrans) Queuing delay (dqueue) Processing delay (dproc) Number of links (Q) Processing delay - time routers take to process the packet header Queuing delay - time the packet spends in routing queues Transmission delay - time it takes to push the packet's bits onto the link Propagation delay - time for a signal to reach its destination 42
Where:
Packet switching : TCP/IP TCP: reliable end-to-end transport IP: internet routing and delivery dynamic routing, load balancing high speed Ethernet LANs
43
Recent advancements driven by the need to support multimedia and real-time traffic Emergence of Internets Integrated Services Architecture (ISA, or IntServ) and Differentiated Service (DS, or DiffServ) New QoS Architecture/Framework is driving protocol changes:
IPv6 introduces new features for QoS RSVP Resource ReSerVation Protocol RTP Real Time Protocol for video , audio and other realtime traffic Multicast routing (IGMP, MOSPF, PIM) 44 Multi-Protocol Label Switching (MPLS)
Applications
all packets treated equally designed for elastic traffic no guarantees of bandwidth or throughput no guarantees of delay no guarantee of jitter (delay variation)
often create inelastic traffic often sensitive to delay often sensitive to jitter often critical in nature generate elastic traffic as well
User Requirements!
45
46
Research Overview
Research area?
Key challenges?
Introduction Identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications, especially of those with a desire to be undetectable. The diminished effectiveness of port-based identification and the overheads of deep packet inspection approaches motivate us to propose a traffic classification methodology that relies on using only flow statistics to classify traffic.
Campus Router Web Streaming P2P
Semi-Supervised Results
Labelling of training feature vectors is one of the most time consuming steps of the classification process.
Retraining Detection Although we found that our classifiers remained robust for extended periods of time, a mechanism for determining when the classifier needs updating is still required.
Our proposed technique is a flexible mathematical framework that leverages both labeled and unlabeled flows. This semi-supervised approach to learning a network traffic classifier is a key contribution of this work. Classification Framework
Unlabelled Training Data Clustering Algorithm Labelled Training Data Unclassified Flows Classified Flows Labelled Clusters Classifier
In Figure 1 we test the hypothesis that if a few flows are labelled in each cluster then we have a reasonable basis for creating the cluster to application mapping. With as few as two labels per cluster, we attain 94% flow accuracy. Real-Time Classification
The results in Figure 2 show the effect on the classifiers precision when we used a fixed number of labelled flows and a varying numbers of unlabelled flows in the training data set. Our results show that for a fixed number of labelled training flows, increasing the number of unlabelled flows increases the classifiers precision.
We propose using the average distance of new flows to the centroid of the nearest cluster; a significant increase in the average distance indicates the need for an update. Conclusions Fast and accurate classifiers can be obtained by training with a small number of labelled flows mixed with a large number of unlabelled flows. High flow and byte accuracy can be achieved for offline and real-time classification Robust classifiers can be built that are immune to transient changes in network conditions. Our approach can be integrated with solutions that collect flow statistics. We developed a prototype real-time classifier using Bro [4]. References
[1] O. Chapelle, B. Scholkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006. [2] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson. Offline/Online Traffic Classification Using Semi-Supervised Learning. To Appear in Proc. of IFIP Performance 2007 [3] J. Erman, A. Mahanti, M. Arlitt, and C. Williamson. Identifying and Discriminating Between Web and Peer-to-Peer Traffic in the Network Core. In WWW07, Banff, Canada, May 2007. [4] V. Paxson. Bro: A System for Detecting Nework Intruders in Real-time. Computer Networks, 31(23-24):2435-2463, 1999.
A fundamental challenge in the design of the real-time classification system is the need to classify a flow as soon as possible. Unlike offline classification where all discriminating flow statistics are available a priori, in the real-time context we only have partial information on the flow statistics. Our solution uses a layered classification system based on the idea of packet milestones. A packet milestone is reached when the count of the total number of packets a flow has sent or received reaches a specific value. Each layer has an independent classifier. Flow statistics are monitored in real-time. As a flow reaches a packet milestone it is classified/reclassified by the appropriate layer. This layered approach allows us to revise and potentially improve the classification of flows. Figures 3 & 4 present example results by using the April 13, 9 am trace we collected from the UofC. We see that the classier performs well, with byte accuracies typically in the 70% to 90% range.
Figure 3: Performance of Real-time Classifier
Step 2: Classification
Classifier assigns each new unclassified flow to the nearest cluster using Euclidean distance. This is the maximum likelihood cluster assignment. Label of the assigned cluster becomes the classification of the flow. A cluster label is obtained using the labelled flows available in each cluster. These can be obtained through a variety of means: (automated) payload analysis, port numbers, expert knowledge.
Acknowledgements
This work was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and Informatics Circle of Research Excellence (iCORE) of the province of Alberta, Canada.
Training Data: Training data can be a mix of labelled and unlabelled flows. Features include: Average Packet Size, Number of Packets, Payload Bytes, Header Bytes, etc.
More Topics
49
Exercise
Take case study on Worldwide LHC Computing Grid on need of high speed network and discuss .
50