Вы находитесь на странице: 1из 63

Network-Coding-Based Signal Recovery For Efficient Scheduling In Wireless Networks

Network coding : Definition


Network coding is a technique where, instead of simply relaying the packets of information they receive, the nodes of a network will take several packets and combine them together for transmission. This can be used to attain the maximum possible information flow in a network. Network coding is a field of information theory and coding theory.

Description
In a Linear Network coding problem, a group of nodes P are involved in moving the data from S source nodes to K sink nodes. Each node generates a new packet, which is a linear combination of the earlier received packets on the link, by coefficients in the finite field. A message generated so Xk is related to the received messages Mi by the relation:

Each node forwards the computed value Xk along with all the coefficients used in the kth level, . are the coefficients from the Galois field GF(2s). Since the operations are

The values

computed in the finite field, the result of the operation is also of the same length, as the size of each vector M. Each node produces a similar output, as computed above. This yields a linear problem of the type X = GM, where with the knowledge of the X,G we need to compute M. Each of the receivers in

K, try to solve this linear equation, and for which at least

packets must be received. The

received packets are continually used in the Gaussian elimination method to reduce G matrix into the row-echelon form. Finally the resulting values of X = GechelonM are solved to obtain M.
Web reference1: http://en.wikipedia.org/wiki/Network_coding

Wireless network: Definition


Wireless network refers to any type of computer network that is not connected by cables of any kind. It is a method by which homes, telecommunications networks and enterprise (business) installations avoid the costly process of introducing cables into a building, or as a connection between various equipment locations.[1] Wireless telecommunications networks are generally implemented and administered using a transmission system called radio waves. This implementation takes place at the physical level (layer) of the network structure.

Description
In a general sense, wireless networks offer a vast variety of uses by both business and home users.Now, the industry accepts a handful of different wireless technologies. Each wireless technology is defined by a standard that describes unique functions at both the Physical and the Data Link layers of the OSI Model. These standards differ in their specified signaling methods, geographic ranges, and frequency usages, among other things. Such differences can make certain technologies better suited to home networks and others better suited to network larger organizations. Web reference2 : http://en.wikipedia.org/wiki/Wireless_network

Comparision Between Wired & Wireless Networks: Definition


The main difference between wired and wireless networks is that there are no wires (the air link) and mobility is thus conferred by the lack of a wired tether. This leads to both the tremendous benefits of

wireless networks and the perceived drawbacks to them. Some of the key technical challenges in wireless communications come from (i) the hostile wireless propagation medium and (ii) user mobility. Most of the issues covered in this chapter arise in any data network, wired and wireless likewise. What makes the wireless different is the degree of importance inherent to the problems that often appear in both types of networks. Text Reference 1: Wireless Networks by

Ivan Marsic

Signal Recovery: Definition


A Signal recovery system is a circuit used to estimate and compensate for frequency and phase differences between a received signal's carrier wave and the receiver's local oscillator for the purpose of coherent demodulation.

Description
In the transmitter of a communications carrier system, a carrier wave is modulated by a baseband signal. At the receiver the baseband information is extracted from the incoming modulated waveform. In an ideal communications system the carrier frequency oscillators of the transmitter and receiver would be perfectly matched in frequency and phase thereby permitting perfect coherent demodulation of the modulated baseband signal. However, transmitters and receivers rarely share the same carrier frequency oscillator. Communications receiver systems are usually independent of transmitting systems and contain their own oscillators with frequency and phase offsets and instabilities. Doppler shift may also contribute to frequency differences in mobile radio frequency communications systems. All these frequency and phase variations must be estimated using information in the received signal to reproduce or recover the carrier signal at the receiver and permit coherent demodulation.

Web reference 3: http://en.wikipedia.org/wiki/Carrier_recovery

Signal Recovery: Definition


The lost signal can be recovered using the special algorithms.

Refernce2: Signal recovery techniques for image and video compression and Transmission by Aggelos Konstantinos Katsaggelos, Nick Galatsanos

Network scheduling:

Definition
o

A graphical display of the logical order of activities that defines the sequence of work in a project

Where the activities are represented by boxes

Networks are usually drawn from left to right


With lines drawn between the boxes To show the "precedence" relationships between them

Arrow heads are sometimes placed on the lines to indicate the direction of the flow through time

Web Reference 5: http://www.maxwideman.com/issacons3/iac1303/tsld002.htm

Existing System For Network Coding Based The First Existing System
In the case of unicast (when only one receiver at a time uses the network), the maximum transmission rate equals what is known as the min-cut value between the source and the receiver, which, for a graph with unit capacity edges, equals the minimum number of edges whose removal would disconnect the receiver from the source. This result has been known for the past 50 years as the Ford-Fulkerson min-cut, max-flow theorem. The proof of the theorem as well as the method to achieve the maximum transmission rate follow from an older result, known as Mengers theorem, which states that the min-cut value from the source to the receiver is equal to h if and only if there exist exactly h edge-disjoint paths that connect the receiver to the source. Therefore, the maximum transmission rate of h from the source to the receiver is achieved by routing h independent unit-rate information streams independently through the h edge disjoint paths. Reference 1: Network Coding for Efficient Network Multicast Emina Soljanin, Piyush Gupta, and Gerhard Kramer

The Second Existing System A multicast example with source S and three receivers R1, R2, and R3 is shown in Figure 1. Note that each receiver can be disconnected from the source by removing two edges. Therefore, the min-cut from the source to each of the receivers is 2, and each receiver is connected to the source by a pair of edge-disjoint paths. The pair of paths for R1 are SSASD and SSBSD, for R2 are SSASE and SSCSE, and for R3 are SSBSF and SSCSF. Note that the paths SSBSD to R1 and SSBSF to R3 share the edge SB. When each of the receivers is the only one using the network, the source can transmit two independent unit rate bit streams simultaneously to the receiver through its two paths. Lets now look at what happens when all three receivers are using the network. If two unit rate information streams are to be delivered to receiver R1, then one of them, say x1, should be routed through the edge SA, and the other, x2, through the edge SB. Now, either x1 or x2 can be routed through the remaining edge SC. If x1 is chosen, then receiver R1 will receive only x1, and if x2 is chosen, then R3 will receive only x2, and thus only half of what the source transmits.On the other hand, if network nodes are allowedto process and combine their input bit streams, then the condition that the min-cut between the source and each receiver be at least h is also sufficient for a multicast at rate h. The area of network coding started with the proof of this claim, upon realization that bit streams in communications networks can be duplicated,merged, or, in general, processed in a way that physical commodities (such as fluids in networks ofpipes) cannot . Reference 2: R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, Network Information Flow, IEEE Trans. Inform. Theory, 46:4 (2000), 12041216.

First Existing System For Wireless Interference Different JNCC schemes have been proposed for wireless networks. In a simple multiple-access relay channel with 2 users was considered. Cooperative diversity was obtained due to the joint design of network-channel coding. The concept of soft network coding was introduced in [10], where a similar multiple-access channel with 2 users was considered, and a low complexity joint design of network and channel coding was proposed. The performance of a joint channel and network coding scheme based on nested codes [12] was proposed in [11], where an explicit code design was provided for a multi-cast scenario with 2 sources and 2 destinations. However, the

network topologies considered in the works above are simple networks with at most 2 hops. For a general multi-hop network with multiple sources and multiple destinations (MS-MD), although it is not optimal in general to decompose the network into canonical elements, it is difficult in practice to perform end-to-end coding with limited channel state information (CSI) and unknown network topology. Thus instead of end-to-end coding, information theoretic results for canonical network elements, namely broadcast channel (BC) and multiple access channel (MAC), do hints on JNCC strategies for a MS-MD network that outperform simplistic separate coding strategies. Reference : Joint Network and Channel Coding for Wireless Networks Qiang Li, See Ho Ting and Chin Keong Ho Nanyang Technological University, By B. Sudheer Kumar M.Tech (SE)

Ranking and Suggesting popular items


Definition
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second. In mathematics, this is known as a weak order or total preorder of objects. It is not necessarily a total order of objects because two different objects can have the same ranking. The rankings themselves are totally ordered. For example, materials are totally preordered by hardness, while degrees of hardness are totally ordered. By reducing detailed measures to a sequence of ordinal numbers, rankings make it possible to evaluate complex information according to certain criteria. Thus, for example, an Internet search engine may rank the pages it finds according to an estimation of their relevance, making it possible for the user quickly to select the pages they are likely to want to see. Webref: http://en.wikipedia.org/wiki/Ranking

Discription
Ranking Items in a Catalog
You can use ranking to control how items are displayed on your Web site. For example, you can have your more popular products appear first or you can have slower-moving products displayed more prominently. You can rank child products and categories in a category, variants in a product family, root products and categories in base catalogs, and related products and categories. You can also rank the enumeration values for an enumeration property. For example, if you have a product in several sizes, you can rank these property values to control the order in which the sizes are displayed.

Ranking applies to both base catalogs and virtual catalogs. Items and relationships in a virtual catalog inherit their ranking from the base catalog. You can override the ranks in the virtual catalog. It is possible to have the same ranking for two or more items. Items are first sorted by rank, then alphabetically. If you do not specify a rank for an item, relationship, or property, the sorting is alphabetical. Web reference: http://msdn.microsoft.com/en-us/library/aa545816%28v=CS.70%29.aspx A shopping cart (trolley, carriage) is a cart supplied by a shop, especially supermarkets, for use by customers inside the shop for transport of merchandise to the check-out counter during shopping. Customers can then also use the cart to transport their purchased goods to their cars. In some places, customers are allowed to leave the carts in the parking lot, and store personnel will return the carts to the storage area. In most European premises however, coin (or token) operated locking mechanisms are provided to encourage shoppers to return the carts to the correct location after use.
Reference http://en.wikipedia.org/wiki/Shopping_cart

How to Rank Catalog Items


You can use ranking to control how items are displayed on your Web site. You can rank child products and categories in a category, variants in a product family, and root products and categories in base catalogs. Ranking applies to both base catalogs and virtual catalogs. Items in a virtual catalog inherit their ranking from the base catalog. You can override the ranks in the virtual catalog. It is possible to have the same ranking for two or more items. Items are first sorted by rank, then alphabetically. If you do not specify a rank for an item, relationship, or property, the sorting is alphabetical.

The rank of a catalog item is set with respect to its parent. To set the rank for a product, access the category and set the rank on the dataset of child items. To set the rank for the variants of a product, access the product and set the rank on the dataset of variants. To set the rank for a category, access the root or parent category and set the rank on the dataset of child categories.
http://msdn.microsoft.com/en-US/library/aa545283%28v=CS.70%29.aspx

How to Rank Related Items


You can use ranking to control how items are displayed on your Web site. Ranking applies to both base catalogs and virtual catalogs. Items and relationships in a virtual catalog inherit their ranking from the base catalog. You can override the ranks in the virtual catalog. It is possible to have the same ranking for two or more items. Items are first sorted by rank, then alphabetically. If you do not specify a rank for a relationship the sorting is alphabetical. You can set the rank for related items when you create the relationship.

http://msdn.microsoft.com/en-US/library/aa545796%28v=CS.70%29.aspx

By B. SharathBabu M.Tech

Credit Card Fraud Detection Using Hidden Markov Model

1. We propose a way of effective fraud detection to improve the detection efficiency. We focus on the bias of the training dataset, which is typically caused by the skewed distribution and highly overlapped classes of credit card transaction data and leads to lots of mis-detections. To reduce mis-detections, we take the fraud density of real transaction data as a confidence value and generate the weighted fraud score in the proposed scheme. REF: Min-Jung Kim and Taek-Soo Kim 2002 2. Detection of fraud undertaken as part of the European Commission-funded ACTS ASPeCT (Advanced Security for Personal Communications

Technologies) project. A first task has been the identification of possible fraud scenarios and of typical fraud indicators which can be mapped to data in Toll Tickets. REF: P Burge, J Shawe-Taylor, C Cooke, Y Moreau, B Preneel, C Stoermann 3. Fraud is increasing dramatically with the expansion of modern technology and the global superhighways of communication, resulting in the loss of billions of dollars worldwide each year. Although prevention technologies are the best way of reducing fraud, fraudsters are adaptive and, given time, will usually find ways to circumvent such measures. Methodologies for the detection of fraud are essential if we are to catch fraudsters once fraud prevention has failed. Statistics and machine learning provide effective technologies for fraud detection and have been applied successfully to detect activities such as money laundering, e-commerce credit card fraud. REF: Richard J. Bolton and David J. Hand January 2002

4. CARDWATCH, a database mining system used for credit card fraud detection, is presented. The system is based on a neural network learning module, provides an interface to a variety of commercial databases and has a comfortable graphical user interface. Test results obtained for synthetically generated credit card data and an auto associative neural network model show very successful fraud detection rates Aleskerov, E.; Freisleben, B.; Rao, B.;

Dept. of Electr. Eng. & Comput. Sci., Siegen Univ. Mar 1997

By Deepthi M.Tech (CSE)

1) In computer networks, bandwidth is often used as a synonym for data transfer rate - the amount of data that can be carried from one point to another in a given time period (usually a second). This kind of bandwidth is usually expressed in bits (of data) per second (bps). Occasionally, it's expressed as bytes per second (Bps). A modem that works at 57,600 bps hastwice the bandwidth of a modem that works at 28,800 bps. In general, a link with a high bandwidth is one that may be able to carry enough information to sustain the succession of images in a video presentation. It should be remembered that a real communications path usually consists of a succession of links, each with its own bandwidth. If one of these is much slower than the rest, it is said to be a bandwidth bottleneck. 2) In electronic communication, bandwidth is the width of the range (or band) of frequencies that an electronic signal uses on a given transmission medium. In this usage, bandwidth is expressed in terms of the difference between the highest-frequency signal component and the lowestfrequency signal component. Since the frequency of a signal is measured in hertz (the number of cycles of change per second), a given bandwidth is the difference in hertz between the highest frequency the signal uses and the lowest frequency it uses. A typical voice signal has a bandwidth of approximately three kilohertz (3 kHz); an analog television (TV) broadcast video signal has a bandwidth of six megahertz (6 MHz) -- some 2,000 times as wide as the voice signal.

Related glossary terms: GB billing, Layer Two Tunneling Protocol (L2TP), OSPF (Open Shortest Path First), tunneling or port forwarding, bottleneck, WAN (Wide Area Network) , Network Address Translation (NAT), managed service provider (MSP), virtual routing and forwarding (VRF), WAN interface card (WIC)

Bandwidth is defined as a band containing all frequencies between upper cut-off and lower cut-off frequencies."
http://nptel.iitm.ac.in/courses/IITMADRAS/Principles_Of_Communication/pdf/Lecture18_Bandwidth.pdf

bandwidth
1) A range within a band of frequencies or wavelengths. (2) The amount of data that can be transmitted in a fixed amount of time. For digital devices, the bandwidth is usually expressed in bits per second(bps) or bytes per second. For analog devices, the bandwidth is expressed in cycles per second, or Hertz (Hz). The bandwidth is particularly important for I/O devices. For example, a fast disk drive can be hampered by a bus with a low bandwidth. This is the main reason that new buses, such as AGP, have been developed for the PC.

http://www.webopedia.com/TERM/B/bandwidth.html

What is Bandwidth? Module by: Christopher Chikalimba-Gama. E-mail the author Summary: This module defines bandwidth and gives examples with respect to the definition.

Bandwidth Bandwidth is a central concept in many fields, including information theory, radio communications, signal processing, and spectroscopy.

Definition 1: Bandwidth Is a measure of frequency range, measured in hertz. ExampleThe range of frequencies within which the performance of the antenna, with respect to some characteristics, conforms to a specified standard. (2.4-2.5GHz antenna has 100MHz bandwidth).

Definition 2: Bandwidth The amount of data that can be transmitted in a fixed amount of time, second(bps) or bytes per second. expressed in bits per

(diff of upper- lower freq) References Webopedia. . http://www.webopedia.com/TERM/B/bandwidth.htm(last accessed 13 February 2006).

Computing and Networking. . http://compnetworking.about.com/od/speedtests/g/bldef_bandwidth.htm (last accessed 13 February 2006).

Wikipedia 2006. . Last accessed 13 February 2006.

Bandwidth is the difference between the upper and lower frequencies in a contiguous set of frequencies. It is typically measured in hertz, and may sometimes refer to passband bandwidth, sometimes to baseband bandwidth, depending on context. Passband bandwidth is the difference between the upper and lower cutoff frequencies of, for example, an electronic filter, a communication channel, or a signal spectrum. In case of a low-pass filter or baseband signal, the bandwidth is equal to its upper cutoff frequency. The term baseband bandwidth always refers to the upper cutoff frequency, regardless of whether the filter is bandpass or low-pass. Bandwidth in hertz is a central concept in many fields, including electronics, information theory, radio communications, signal processing, and spectroscopy. A key characteristic of bandwidth is that a band of a given width can carry the same amount of information, regardless of where that band is located in the frequency spectrum (assuming equivalent noise level). For example, a 5 kHz band can carry a telephone conversation whether that band is at baseband (as in your POTS telephone line) or modulated to some higher (passband) frequency. In computer networking and other digital fields, the term bandwidth often refers to a data rate measured in bits per second, for example network throughput, sometimes denoted network bandwidth, data bandwidth or digital bandwidth. The reason is that according to Hartley's law, the digital data rate limit (or channel capacity) of a physical communication link is proportional to its bandwidth in hertz, sometimes denoted radio frequency (RF) bandwidth, signal bandwidth, frequency bandwidth, spectral bandwidth or analog bandwidth. For bandwidth as a computing term, less ambiguous terms are bit rate, throughput, maximum throughput, goodput or channel capacity.

X-dB bandwidth

A graph of a bandpass filter's gain magnitude, illustrating the concept of 3 dB bandwidth at a gain of 0.707. The frequency axis of this symbolic diagram can be linear or logarithmically scaled.

In some contexts, the signal bandwidth in hertz refers to the frequency range in which the signal's spectral density is nonzero or above a small threshold value. That definition is used in calculations of the lowest sampling rate that will satisfy the sampling theorem. Because this range of non-zero amplitude may be very broad or infinite, this definition is typically relaxed so that the bandwidth is defined as the range of frequencies in which the signal's spectral density is above a certain threshold relative to its maximum. Most commonly, bandwidth refers to the 3-dB bandwidth, that is, the frequency range within which the spectral density (in W/Hz or V2/Hz) is above half its maximum value (or the spectral amplitude, in V or V/Hz, is more than 70.7% of its maximum); that is, above 3 dB relative to the peak.[1] The word bandwidth applies to signals as described above, but it could also apply to systems, for example filters or communication channels. To say that a system has a certain bandwidth means that the system can process signals of that bandwidth, or that the system reduces the bandwidth of a white noise input to that bandwidth. The 3 dB bandwidth of an electronic filter or communication channel is the part of the system's frequency response that lies within 3 dB of the response at its peak, which in the passband filter case is typically at or near its center frequency, and in the lowpass filter is near 0 hertz. If the maximum gain is 0 dB, the 3 dB gain is the range where the gain is more than -3dB, or the attenuation is less than + 3dB. This is also the range of frequencies where the amplitude gain is above 70.7% of the maximum amplitude gain, and above half the maximum power gain. This same "half power gain" convention is also used in spectral width, and more generally for extent of functions as full width at half maximum (FWHM). In electronic filter design, a filter specification may require that within the filter passband, the gain is nominally 0 dB +/- a small number of dB, for example within the +/- 1 dB interval. In the stopband(s), the required attenuation in dB is above a certain level, for example >100 dB. In a transition band the gain is not specified. In this case, the filter bandwidth corresponds to the

passband width, which in this example is the 1dB-bandwidth. If the filter shows amplitude ripple within the passband, the x dB point refers to the point where the gain is x dB below the nominal passband gain rather than x dB below the maximum gain. A commonly used quantity is fractional bandwidth. This is the bandwidth of a device divided by its center frequency. E.g., a passband filter that has a bandwidth of 2 MHz with center frequency 10 MHz will have a fractional bandwidth of 2/10, or 20%. In communication systems, in calculations of the ShannonHartley channel capacity, bandwidth refers to the 3dB-bandwidth. In calculations of the maximum symbol rate, the Nyquist sampling rate, and maximum bit rate according to the Hartley formula, the bandwidth refers to the frequency range within which the gain is non-zero, or the gain in dB is below a very large value. The fact that in equivalent baseband models of communication systems, the signal spectrum consists of both negative and positive frequencies, can lead to confusion about bandwidth, since they are sometimes referred to only by the positive half, and one will occasionally see expressions such as B = 2W, where B is the total bandwidth (i.e. the maximum passband bandwidth of the carrier-modulated RF signal and the minimum passband bandwidth of the physical passband channel), and W is the positive bandwidth (the baseband bandwidth of the equivalent channel model). For instance, the baseband model of the signal would require a lowpass filter with cutoff frequency of at least W to stay intact, and the physical passband channel would require a passband filter of at least B to stay intact. In signal processing and control theory the bandwidth is the frequency at which the closed-loop system gain drops 3 dB below peak. In basic electric circuit theory, when studying band-pass and band-reject filters, the bandwidth represents the distance between the two points in the frequency domain where the signal is of the maximum signal amplitude (half power).

References
1. ^ Van Valkenburg, M. E.. Network Analysis (3rd edition ed.). pp. 383384. ISBN 0-13-611095-9. Retrieved 2008-06-22. 2. ^ Stutzman, Warren L., and Gary A. Theiele. Antenna Theory and Design. 2nd Ed. New York: 1998. ISBN 0-471-02590-9 http://en.wikipedia.org/wiki/Bandwidth_%28signal_processing%29

By Varun Kumar M.Tech (CSE)

Data integrity proofs in cloud storage

Cloud storage is a model of networked online storage where data is stored on virtualized pools of storage which are generally hosted by third parties. Hosting companies operate large data centers; and people who require their data to be hosted buy or lease storage capacity from them and use it for their storage needs. The data center operators, in the background, virtualize the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers. Cloud storage services may be accessed through a web service application programming interface (API), or through a Web-based user interface.

source:http://en.wikipedia.org/wiki/Cloud_storage

Data integrity proofs in cloud storage


Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness is composed of four essential qualities or core attributes: completeness, currency/timeliness, accuracy/correctness and validity/authorization.The concept of business rules is already widely used nowadays and is subdivided into six categories which include data rules. Data is further subdivided Data Integrity Rules, data sourcing rules, data extraction rules, data transformation rules and data deployment rules. Data Integrity is very important in database operations in particular and Data Warehousing and Business Intelligence in general. Because Data Integrity ensured that data is of high quality, correct, consistent and accessible, in is important to follow rules governing Data Integrity. A Data Value Rule or Conditional Data Value Rule specifies data domains. The difference between the two is that the former specifies the domain of allowable values for a data attribute which applies to all situation while the latter does not apply to all situations but only when there exceptions or certain conditions that applies. Data Structure Rule defines that cardinality of data for a data relation in cases where there are no conditions of exceptions which apply. This rule makes data structure very easy to understand. A conditional data structure rule is slightly different in that is governs when conditions or exceptions apply on data cardinality for a data relation. A Data Derivation Rule specifies the how a data value is derived based on algorithm, contributors and conditions. It also specifies the conditions on how the data value could be re-derived. A Data Retention Rule specifies the length of time of data values which can be retained in a particular database. It is specifies what can be done with data values when its use for a database expires A data occurrence retention rule specifies the length of time the data occurrence is retained and what can be done with data when it is no longer useful. A data attribute retention rule is similar to a data retention rule

but the data attribute retention rule only applies to specific data values rather than the entire data occurrence. These Data Integrity Rules, like any other rules, are totally without meaning when they are not implemented and enforced. In order to achieve Data Integrity, these rules should be consistently and routinely applied to all data which are entering the Data Warehouse or any Data Resource for that matter. There should be no waivers or exceptions for the enforcement of these rules because any slight relaxation of enforcement could mean a tremendous error result. As much as possible, these Data Integrity Rules must be implemented in as close to the initial capture of data so that early detection and correction of potential breach of integrity can be taken action. This can greatly prevent errors and inconsistencies from entering the database. With strict implementation and enforcement of these Data Integrity Rules, data error rates could be much lower so less time is spent on trying to troubleshoot and trace faulty computing results. This translates to savings from manpower expense. Since there is low error rate, there can only be high quality data that can be had to provide better support in the statistical analysis, trend and pattern spotting, and decision making tasks of a company. In today's digital age, information one major key to success and having the right information means having better edge over the competitors.

source:http://en.wikipedia.org/wiki/Data_integrity

By Chitra M.Tech (CSE)

A metrics for code readability and maintain a software quality

DEFINATIONS:
METRICS:
Metrics have become a standard accoutrement of all application life-cycle management tools. REF: SD Times. BZ Media. Retrieved 19 October 2010.

Parameters or measures of quantitative assessment used for measurement, comparison or to track performance or production. Analysts use metrics to compare the performance of different companies, despite the many variations between firms. Ref: http://www.investopedia.com/terms/m/metrics.asp#ixzz1hKjYZso8 Metric is a measure of some property of a piece of software or its specifications.
REF: http://en.wikipedia.org/wiki/Metric DR.SENGAI podhuvan

In software development, a metric (noun) is the measurement of a particular characteristic of a program's performance or efficiency. Similarly in network routing.
REF : http://whatis.techtarget.com/definition/0,,sid9_gci212560,00.html 5 Apr 2005

Metrics are a set of measurements that quantify results. Performance metrics quantify the units performance. Project metrics tell you whether the project is meeting its goals. Business metrics define the business' progress in measurable terms.

REF: F. John Reh

CODE :
Code is described as many things: it is a cultural logic, a machenic operation or a process that is unfolding. It is becoming, todays hegemonic metaphor; inspiring quasi-semiotic investigations within cultural and artistic practice (e.g. The Matrix). REF: Fri, 2005-11-18 11:02 -- David Berry In code the role of the partial coder is to perceive and to experience, although these perceptions and affections might not be those of the coder, in the currently accepted sense, but belong to the code. REF : Guattari
A code is a rule for converting a piece of information.

REF: http://en.wikipedia.org/wiki/Code. Akshaya Iyenger. A set of symbols for representing something. For example, most computers use ASCII codes to represent characters. REF: http://www.webopedia.com/TERM/C/code.html

Code can appear in a variety of forms. The code that a programmer writes is called source code. After it has been compiled, it is called object code. Code that is ready to run is called executable code or machine code. REF: http://www.webopedia.com/TERM/C/code.html

READABILITY:
Readability is the ease in which text can be read and understood. Various factors to measure readability have been used, such as speed of perception. REF : http://en.wikipedia.org/wiki/Readability

Text readability is a measure of how well and how easily a text conveys its intended meaning to a reader of that text. REF : http://www.readability.biz

Readability is what makes some texts easier to read than others. It is often confused with legibility, which concerns typeface and layout. REF: William H. DuBay 25 August 2004 2004. Alexander, R. E. 2000. Readability of published dental educational materials. Journal of America dental association 7:937-943. Armbruster, B. B. 1984. The problem of inconsiderate text. In Comprehension instruction, ed. G. Duffey. New York: Longmann, p. 202-217 Readability is a free web and mobile app with a simple purpose: to deliver a great reading experience. It turns virtually any web page into a clean, comfortable reading view. You can also sync articles for reading later as you surf around the web. Readability works on your web browser, iPhone, iPad and Android smartphone, giving you the flexibility to read anytime, anywhere. REF: http://www.readability.com/faq

Readability describes the ease with which a document can be read. Readability tests, which are mathematical formulas, were designed to assess the suitability of books for students at particular grade levels or ages. REF: http://plainlanguage.com/newreadability.html

Software Quality:
Software quality measurement is about quantifying to what extent a software or system possesses desirable characteristics. This can be performed through qualitative or quantitative. REF : http://en.wikipedia.org/wiki/Software_quality The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria REF : http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality A definition of Software Quality should emphasize three important points: 1.) Software requirements are the foundation from which Software Quality is measured. 2.) Specified standards define a set of development criteria that guide the manner in which software is engineered. 3.) There is a set of implicit requirements that often goes unmentioned. REF: http://www.thedacs.com/databases/url/key/3494

The Software Quality Journal promotes awareness of the crucial role of quality management in the effective construction of the software systems developed, used and maintained by organizations REF: http://www.springer.com/computer/swe/journal/11219

DESCRIPTION:
METRICS :
A software metric is a measure of some property of a piece of software or its specifications. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development. The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance testing, software debugging, software performance optimization, and optimal personnel task assignments. Limitations: As software development is a complex process, with high variance on both methodologies and objectives, it is difficult to define or measure software qualities and quantities and to determine a valid and concurrent measurement metric, especially when making such a prediction prior to the detail design. Another source of difficulty and debate is in determining which metrics matter, and what they mean.[2][3] The practical utility of software measurements has thus been limited to narrow domains where they include:

Schedule Size/Complexity Cost Quality

Goal of metrics: To improve product quality and development-team productivity Concerned with productivity and quality measures Measures of SW development output as function of effort and time Measures of usability

Metrics apply to: process - used to develop the SW project - specific SW development project product - SW produced . many of the same metrics apply to both the process and project domains

References:
1. Land Software Engineering Centre. Retrieved 19 October 2010. 2. Binstock, Andrew. "Integration Watch: Using metrics effectively". SD Times. BZ Media. Retrieved 19 October 2010. 3. Kolawa, Adam. "When, Why, and How: Code Analysis". The Code Project. Retrieved 19 October 2010. 4. Kaner, Dr. Cem, Software Engineer Metrics: What do they measure and how do we know? 5. ProjectCodeMeter (2010) "ProjectCodeMeter Users Manual" page 65 6. Lincke, Rdiger; Lundberg, Jonas; Lwe, Welf (2008), "Comparing software metrics tools", International Symposium on Software Testing and Analysis 2008: pp. 131142 7. DeMarco, Tom. Controlling Software Projects: Management, Measurement and Estimation. ISBN 0-13-171711-1. 8. NASA Metrics Planning and Reporting Working Group (MPARWG) 9. USC Center for Systems and Software Engineering 10. Infsy 570 Dr. R. Ocker

CODE:
A code is usually considered as an algorithm which uniquely represents symbols from some source alphabet, by encoded strings, which may be in some other target alphabet. An extension of the code for representing sequences of symbols over the source alphabet is obtained by concatenating the encoded strings. Before giving a mathematically precise definition, we give a brief example. The mapping

is a code, whose source alphabet is the set {a,b,c} and whose target alphabet is the set {0,1}. Using the extension of the code, the encoded string 0011001011 can be grouped into codewords as 0 011 0 01 011, and these in turn can be decoded to the sequence of source symbols acabc.

Variable-length codes: In this section we consider codes, which encode each source (clear text) character by a code word from some dictionary, and concatenation of such code words give us an encoded string. Variable-length codes are especially useful when clear text characters have different probabilities; see also entropy encoding. Error correcting codes : Codes may also be used to represent data in a way more resistant to errors in transmission or storage. Such a "code" is called an error-correcting code, and works by including carefully crafted redundancy with the stored (or transmitted) data.

Examples:
Codes can be used for brevity. When telegraph messages were the state of the art in rapid long distance communication, elaborate systems of commercial codes that encoded complete phrases into single words (commonly five-letter groups) were developed, so that telegraphers. Character encodings: Probably the most widely known data communications code so far (aka character representation) in use today is ASCII. In one or another (somewhat compatible) version, it is used by nearly all personal computers, terminals, printers, and other communication equipment. It represents 128 characters with seven-bit binary numbersthat is, as a string of seven 1s and 0s. In ASCIIvcx a

lowercase "a" is always 1100001, an uppercase "A" always 1000001, and so on. There are many other encodings, which represent each character by a byte. Genetic code:

Biological organisms contain genetic material that is used to control their function and development. This is DNA which contains units named genes that can produce proteins through a code (genetic code) in which a series of triplets {codons} of four possible nucleotides are translated into one of twenty possible amino acids. A sequence of codons results in a corresponding sequence of amino acids that form a protein. Gdel code: In mathematics, a Gdel code was the basis for the proof of Gdel's incompleteness theorem. Here, the idea was to map mathematical notation to a natural number (using a Gdel numbering) Cryptography : In the history of cryptography, codes were once common for ensuring the confidentiality of communications, although ciphers are now used instead. See code (cryptography).

Other examples: Other examples of encoding include:

Encoding (in cognition) is a basic perceptual process of interpreting incoming stimuli; technically speaking, it is a complex, multi-stage process of converting relatively objective sensory input (e.g., light, sound) into subjectively meaningful experience. A content format is a specific encoding format for converting a specific type of data to information. Text encoding uses a markup language to tag the structure and other features of a text to facilitate processing by computers. (See also Text Encoding Initiative.) Semantics encoding of formal language A in formal language B is a method of representing all terms (e.g. programs or descriptions) of language A using language B.

. References: http://en.wikipedia.org/wiki/Code#References

Readability :
Readability" as a human judgment of how easy atext is to understand. The readability of a program is relatedto its maintainability, and is thus a critical factor in over-all software quality. Typically, maintenance will consumeover 70% of the total lifecycle cost of a software product. Readability also correlates with software quality, code change and defect reporting activities. READABILITY MODEL: We have shown that there is significant agreement between our group of annotators on the relative readability of snippets. How ever, the processes that underlie this correlationare unclear. In this section, weexplore the extent to whichwe can mechanically predict human readability judgments.Weendeavor to determine which code features are predictive of readability, and construct a model (i.e., an automated software readability metric) to analyze other code. Model Generation: First, we form a set of features that can be detected statically from a snippet or other block of code. We have chosen features that are relatively simple and that intuitively seem likely to have some effect on readability. They are factors related to structure, density, logical complexity, documentation ,and so on. Importantly, to be consistent with our notion of readability as discussed in Section 2.1, each feature is independent of the size of a code block. Fig. 6enumerates the set of code features that our metric considers when judging code readability. Each feature can be applied to an arbitrary sized block of Java source code,and each represents either an average value per line, or a maximum value for all lines. For example, we have a feature that represents the average number of identifiers in each line and another that represents the maximum number Model Performance : We now test the hypothesis that local textual surface features of code are sufficient to capture human notions of readability. Two relevant success metrics in an experiment of this type are recall and precision. Here, recall is the percentage of those snippets judged as more readable by the annotators that are classified as more readable by the model. Precision is the fraction of the snippets classified as more readable by the model that were also judged as more readable by the annotators. When considered independently,each of these metrics can be made perfect trivially(e.g., a degenerate model that always returns more readablehas perfect recall).We thus weight them together using the f-measure statistic, the harmonic mean of precision and recall [7]. This, in a sense, reflects the accuracy of the classifier with respect to the more readable snippets. We also consider the overall accuracy of the classifier by finding the percentage of correctly classified snippets.

CORRELATING READABILITY : In the previous section, we constructed an automated model of readability that mimics human judgments. We implemented our model in a tool that assesses the readability of programs. In this section, we use that tool to test the hypothesis that readability (as captured by our model)correlates with external conventional metrics of software quality. Specifically, we first test for a correlation between readability and Find Bugs, a popular static bug-finding tool [19]. Second, we test for a similar correlation with changes to code between versions of several large open source projects. Third, we do the same for version control log messages indicating that a bug has been discovered and fixed. Readability Correlations : Our first experiment tests for a correlation between defects detected by Find Bugs with our readability metric at the function level. We first ran FindBugs on the benchmark, noting defect reports. Second, we extracted all of the functions and partitioned them into two sets: those containing at least one reported defect and those containing none. To avoid bias between programs with varying numbers of reported defects, we normalized the function set sizes. We then ran the already-trained classifier on the set of functions, recording an f-measure forcontains a bug with respect to the classifier judgment of less readable.The purpose of this experiment is to investigate the extend to which our model correlates with an external notion of code quality in aggregate
.

Reference : [J. Lionel E. Deimel 1985, D. R. Raymond 1991, S. Rugaber 2000] [B. Boehm and V. R. Basili 2001] 1.K. Aggarwal, Y. Singh, and J.K. Chhabra, An Integrated Measure of Software Maintainability, Proc. Reliability and Maintainability Symp., pp. 235-241, Sept. 2002. [2] S. Ambler, Java Coding Standards, Software Development, vol. 5, no. 8, pp. 67-71, 1997. [3] B.B. Bederson, B. Shneiderman, and M. Wattenberg, Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies, ACM Trans. Graphics, vol. 21, no. 4, pp. 833-854, 2002.

SOFTWARE QUALITY : software quality refers to two related but distinct notions that exist wherever quality is defined in a business context: Software functional quality: It reflects how well it complies with or conforms to a given design, based on functional requirements or specifications. That attribute can also be described as the fitness for purpose of a piece of software or how it compares to competitors in the marketplace as a worthwhile product. Software structural quality : It refers to how it meets non-functional requirements that support the delivery of the functional requirements, such as robustness or maintainability, the degree to which the software was produced correctly. Structural quality is evaluated through the analysis of the software inner structure, its source code, in effect how its architecture adheres to sound principles of software architecture. In contrast, functional quality is typically enforced and measured through software testing.

Motivation for Defining Software Quality: "A science is as mature as its measurement tools," (Louis Pasteur in Ebert and al.l, p. 91) and software engineering has evolved to a level of maturity that makes it not only possible but also necessary to measure quality software for at least two reasons. Risk Management : Software failure has caused more than inconvenience. Software errors have caused human fatalities. The causes have ranged from poorly designed user interfaces to direct programming errors. An example of a programming error that lead to multiple deaths is discussed in Dr. Leveson's paper.[2] This resulted in requirements for the development of some types of software, particularly and historically for software embedded in medical and other devices that regulate critical infrastructures: "[Engineers who write embedded software] see Java programs stalling for one third of a second to perform garbage collection and update the user interface, and they envision airplanes falling out of the sky.".[3] In the United States, within the Federal Aviation Administration (FAA), the Aircraft Certification Service provides software programs, policy, guidance and training, focus on software and Complex Electronic Hardware that has an effect on the airborne product (a product is an aircraft, an engine, or a propeller)".

Cost Management: As in any other fields of engineering, an application with good structural software quality costs less to maintain and is easier to understand and change in response to pressing business needs. Industry data demonstrate that poor application structural quality in core business applications (such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM) or large transaction processing systems in financial services) results in cost and schedule overruns and creates waste in the form of rework (up to 45% of development time in some organizations [4] ). Moreover, poor structural quality is strongly correlated with high-impact business disruptions due to corrupted data, application outages, security breaches, and performance problems. References : Pressman, Scott (2005), Software Engineering: A Practitioner's Approach (Sixth, International ed.), McGraw-Hill Education Pressman, p. 388 Medical Devices: The Therac-25*, Nancy Leveson, Uninversity of Washington Embedded Software, Edward A. Lee, To appear in Advances in Computers (M. Zelkowitz, editor), Vol. 56, Academic Press, London, 2002, Revised from UCB ERL Memorandum M01/26 University of California, Berkeley, CA 94720, USA, November 1, 2001 Improving Quality Through Better Requirements (Slideshow), Dr. Ralph R. Young, 24/01/2004, Northrop Grumman Information Technology DeMarco, T., Management Can Make Quality (Im)possible, Cutter IT Summit, Boston, April 1999 Crosby, P., Quality is Free, McGraw-Hill, 1979 McConnell, Steve (1993), Code Complete (First ed.), Microsoft

By Kranthi.P M.Tech (SE)

MABS: Multicast Authentication Based on Batch Signature


Multicast:
In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source creating copies automatically in other network elements, such as routers, only when the topology of the network requires it. Multicast is communication between a single sender and multiple receivers on a network. 1)IP multicast is a method of sending Internet Protocol (IP) datagrams to a group of interested receivers in a single transmission Description: IP multicast is a technique for one-to-many and many-to-many real-time communication over an IP infrastructure in a network. It scales to a larger receiver population by not requiring prior knowledge of who or how many receivers there are. Multicast uses network infrastructure efficiently by requiring the source to send a packet only once, even if it needs to be delivered to a large number of receivers. The nodes in the network (typically network switches and routers) take care of replicating the packet to reach multiple receivers such that messages are sent over each link of the network only once. The most common low-level protocol to use multicast addressing is User Datagram Protocol (UDP). By its nature, UDP is not reliablemessages may be lost or delivered out of order. Reliable multicast protocols such as Pragmatic General Multicast (PGM) have been developed to add loss detection and retransmission on top of IP multicast. To transmit a single message to a select group of recipients. Ref1: http://en.wikipedia.org/wiki/Multicast 2) Multicasting is sending data from a sender to multiple receivers where each receiver signals that they want to receive the data. This is different from Unicasting, and different from broadcasting (where everyone gets the data whether they want it or not). Notice that you normally have a one-way connection, thereby ruling out TCP. Yes, that's right, Multicast is done

via UDP. There are methods in use where a receiver can signal a sender that it has received a bad packet using a different 'reverse channel'. This is called "reliable multicast" and has little value for live video transmission. Ref2: http://wiki.yak.net/916/multicast-rtp-etc.html 3) Multicast is a true broadcast. The multicast source relies on multicast-enabled routers to forward the packets to all client subnets that have clients listening. There is no direct relationship between the clients and Windows Media server. The Windows Media server generates an .nsc (NetShow channel) file when the multicast station is first created. Typically, the .nsc file is delivered to the client from a Web server. This file contains information that the Windows Media Player needs to listen for the multicast. This is similar to tuning into a station on a radio. Each client that listens to the multicast adds no additional overhead on the server. In fact, the server sends out only one stream per multicast station. The same load is experienced on the server whether only one client or 1,000 clients are listening

Ref3: http://support.microsoft.com/kb/291786

Authentication:
1)Authentication is the process of determining whether someone or something is, in fact, who or what it is declared to be. Description: In private and public computer networks (including the Internet), authentication is commonly done through the use of logon passwords. Knowledge of the password is assumed to guarantee that the user is authentic. Each user registers initially (or is registered by someone else), using an assigned or self-declared password. On each subsequent use, the user must know and use the previously declared password. The weakness in this system for transactions that are significant (such as the exchange of money) is that passwords can often be stolen, accidentally revealed, or forgotten. Ref1: http://searchsecurity.techtarget.com/definition/authentication

2) Authentication is any process by which you verify that someone is who they claim they are. This usually involves a username and a password, but can include any other method of demonstrating identity, such as a smart card, retina scan, voice recognition, or fingerprints. The process of identifying an individual, usually based on a username and password. In security systems, authentication is distinct from authorization ,which is the process of giving individuals access to system objects based on their identity. Authentication merely ensures that the individual is who he or she claims to be, but says nothing about the access rights of the individual. Ref2: http://httpd.apache.org/docs/1.3/howto/auth.html 3) Authentication is a process where a person or a computer program proves their identity in order to access information. The persons identity is a simple assertion, the login ID for a particular computer application, for example. Proof is the most important part of the concept and that proof is generally something known, like a password; something possessed, like your ATM card; or something unique about your appearance or person, like a fingerprint... Strong authentication will require at least two of these proofs. State of the art authentication processes are tightly linked with encryption or crypto systems. In a world where the application that wants to authenticate you is on the other side of an open network like the Internet, the password that is your proof must be sent encrypted or its no longer a secret.

Ref3: http://www.rsa.com/glossary/default.asp?id=1006

Batch Signature:
1) An efficient batch signature generation scheme for signing multiple messages simultaneously. The scheme can be based on any signature scheme with appendix and resultant signatures preserve the property of independent verification by different recipients. The scheme is shown to be of almost constant complexity for both generation and verification as the number of messages increases while the size of the signature increases only logarithmically with the number of

messages in the batch. It is demonstrated that the security of the batch signature is equivalent to the security of the underlying signature mechanisms.

Ref: http://www.sqnbankingsystems.com/products-services/signature-verification 2) S i g n a t u r e v e r i f i c a t i o n m a y b e p e r f o r m e d b y a n y p a r t y using the signatorys public key. A signatory may wish to verify that the computed signature is correct, perhaps before sending the signed message to the intended recipient. The intended recipient of

verifies the signature to determine its authenticity. Prior to verifying thesignature

a signed message, the domain parameters, and the claimed signatorys public key and identity shall be made available to the verifier in an authenticated manner.

Ref:

http://www.scribd.com/doc/52164288/2MABS-Multicast-Authentication-Based-on-Batch-

Signature

3) A batch RSA digital signature scheme in which a signer can sign messages for multiple recipients simultaneously. The construction is quite effcient due to the batch signing method. This is useful to improve the performance of a high-loaded signing server, for example a secure electronic transaction (SET) gateway. Theoretical calculations and experimental results show that the proposed scheme can improve the performance of the signing server significantly.

Ref: http://www.cnki.com.cn/Article/CJFDTOTAL-TRAN200903008.htm

By Sindhu.T M.Tech (SE)

Effective software Merging in the Presence of Oject-Oriented Refactorings

Software:
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it.

Description:
software is a conceptual entity which is a set of computer programs, procedures, and associated documentation concerned with the operation of a data processing system. We can also say software refers to one or more computer programs and data held in the storage of the computer for some purposes. In other words software is a set of programs, procedures, algorithms and its documentation. Program software performs the function of the program it implements, either by directly providing instructions to the computer hardware or by serving as input to another piece of software. The term was coined to contrast to the old term hardware (meaning physical devices). In contrast to hardware, software "cannot be touched". Software is also sometimes used in a more narrow sense, meaning application software only. Sometimes the term includes data that has not traditionally been associated with computers, such as film, tapes, and records. Ref: http://en.wikipedia.org/wiki/Computer_software

Def2:
Computer instructions or data. Anything that can be stored electronically is software.

Description2:
The terms software and hardware are used as both nouns and adjectives. For example, you can say: "The problem lies in the software," meaning that there is a problem with the program or data, not with the computer itself. You can also say: "It's a software problem."

The distinction between software and hardware is sometimes confusing because they are so integrally linked. Clearly, when you purchase a program, you are buying software. But to buy the software, you need to buy the disk (hardware) on which the software is recorded.

Software is often divided into two categories: systems software : Includes the operating system and all the utilities that enable the computer to function. applications software : Includes programs that do real work for users. For example, word processors, spreadsheets, and database management systems fall under the category of applications software. Ref: http://www.webopedia.com/TERM/S/software.html

Merging:
Merging means combining two articles into a single article.

Description:
A merger is a non-automated process by which the content of two pages is united on one page. Reasons to merge a page include the following: unnecessary duplication of content, significant overlap with the topic of another page, and minimal content that could be covered in or requires the context of a page on a broader topic. Discretion should be exercised to make sure merging does not result in an article that is too long or drawn out, short articles that can be shown to be expanded, or if said short articles are of a discrete enough topic that merging is not possible. Merging results in a redirect to the parent page(s) with some or all content cut-and-pasted into that page(s). A comment in the edit summary must be made in the pages being merged as to where they are being merged to, and it must be noted in the parent page(s)' edit summary where the content from other pages are being merged from; this is done to preserve attribution under the Creative Commons Share-alike 3.0 license. Ref: http://en.wikipedia.org/wiki/Wikipedia:Merging

Def2:
Where branches are used to maintain separate lines of development, at some stage you will want to merge the changes made on one branch back into the trunk, or vice versa.

Description:
It is important to understand how branching and merging works in Subversion before you start using it, as it can become quite complex. It is highly recommended that you read the chapter Branching and Merging in the Subversion book, which gives a full description and many examples of how it is used.

The next point to note is that merging always takes place within a working copy. If you want to merge changes into a branch, you have to have a working copy for that branch checked out, and invoke the merge wizard from that working copy using TortoiseSVN Merge....

In general it is a good idea to perform a merge into an unmodified working copy. If you have made other changes in your WC, commit those first. If the merge does not go as you expect, you may want to revert the changes, and the Revert command will discard all changes including any you made before the merge. Ref: http://tortoisesvn.net/docs/release/TortoiseSVN_en/tsvn-dug-merge.html

Object-Oriented Programing:
Object-oriented programming (OOP) is a programming paradigm using "objects" data structures consisting of data fields and methods together with their interactions to design applications and computer programs.

Description:
Simple, non-OOP programs may be one "long" list of statements (or commands). More complex programs will often group smaller sections of these statements into functions or subroutines each of which might perform a particular task. With designs of this sort, it is common for some of the program's data to be 'global', i.e. accessible from any part of the program. As programs grow in size, allowing any function to modify any piece of data means that bugs can have wide-reaching effects.

In contrast, the object-oriented approach encourages the programmer to place data where it is not directly accessible by the rest of the program. Instead, the data is accessed by calling specially written functions, commonly called methods, which are either bundled in with the data or inherited from "class objects." These act as the intermediaries for retrieving or modifying the data they control. The programming construct that combines data with a set of methods for accessing and managing those data is called an object. The practice of using subroutines to examine or modify certain kinds of data, however, was also quite commonly used in non-OOP modular programming, well before the widespread use of objectoriented programming. Ref: http://en.wikipedia.org/wiki/Object-oriented_programming

Def2:
Object-oriented programming (OOP) is a programming language model organized around "objects" rather than "actions" and data rather than logic.

Description:
Historically, a program has been viewed as a logical procedure that takes input data, processes it, and produces output data.

The programming challenge was seen as how to write the logic, not how to define the data. Objectoriented programming takes the view that what we really care about are the objects we want to manipulate rather than the logic required to manipulate them. Examples of objects range from human

beings (described by name, address, and so forth) to buildings and floors (whose properties can be described and managed) down to the little widgets on your computer desktop (such as buttons and scroll bars). Ref: http://searchsoa.techtarget.com/definition/object-oriented-programming

Refactoring:
"Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure." MartinFowler.

Description:
Refactoring is a kind of reorganization. Technically, it comes from mathematics when you factor an expression into an equivalence - the factors are cleaner ways of expressing the same statement. Refactoring implies equivalence; the beginning and end products must be functionally identical. You can view refactoring as a special case of reworking (see WhatIsReworking).

Practically, refactoring means making code clearer and cleaner and simpler and elegant. Or, in other words, clean up after yourself when you code. Examples would run the range from renaming a variable to introducing a method into a third-party class that you don't have source for.

Refactoring is not rewriting, although many people think they are the same. There are many good reasons to distinguish them, such as regression test requirements and knowledge of system functionality. The technical difference between the two is that refactoring, as stated above, doesn't change the functionality (or information content) of the system whereas rewriting does. Rewriting is reworking. Ref: http://c2.com/cgi/wiki?WhatIsRefactoring

Def2:
A change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior. Ref: http://sourcemaking.com/refactoring/defining-refactoring

By Manasa Bhavani M.Tech (SE)

Gui Interaction Testing : Incorporating Event Context

GUI Definition

A graphical user interface (GUI) is a human-computer interface (i.e., a way for humans to interact with computers) that uses windows, icons and menus and which can be manipulated by a mouse (and often to a limited extent by a keyboard as well).

Reference: www.linfo.org/gui definition

A program interface that takes advantage of the computer's graphics capabilities to make the program easier to use.

Reference: www.webopedia.com/TERM/G/gui definition

An interface for issuing commands to a computer utilizing a pointing device, such as a mouse, that manipulates and activates graphical images on a monitor.

Reference :www.thefreedictionary.com/gui

In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. Reference : www.en.wikipedia.org/wiki/gui

Interaction Definition
Interaction is a mutual or reciprocal action influence. Reference : www.thefreedictionary.com/interaction definition

Testing Definition
Testing is finding out how well something works.

Reference : www.search.windevelopment.techtarget.com

Testing is an investigation conducted to provide stackholders with information about the quality of the product or service under test. Reference :www.en.wikipedia.org/wiki/testing definition

Incorporating Definition
The act of combining into an intergral whole a consolidation of two corporations. Reference : www.thefreedictionary.com

Event Definition
Occurrence happening at a determinable time and place with or without the participation of human agent. Reference : www.businessdictionary.com

Context Definition
The part of text or statement that surrounds a particular word or passage and determines its meaning. Reference : www.thefreedictionary.com

Gui Description
In computing, a graphical user interface (GUI, sometimes pronounced gooey[1]) is a type of user interface that allows users tointeract with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devicessuch as MP3 players, portable media players or gaming devices, household appliances and office equipment. A GUI represents the information and actions available to a user through graphical icons and visual indicators such as secondary notation, as opposed to text-based interfaces, typed command labels or text navigation. The actions are usually performed through direct manipulation of the graphical elements.[2] The term GUI is historically restricted to the scope of two-dimensional display screens with display resolutions able to describe generic information, in the tradition of the computer science research at the PARC (Palo Alto Research Center). The term GUIearlier might have been applicable to other high-resolution types of interfaces that are non-generic, such as video games, or not restricted to flat screens, like volumetric displays.[3 Following PARC the first GUI-centric computer operating model was the Xerox 8010 Star Information System in 1981,[4] followed by the Apple Lisa (which presented the concept of menu bar as well as window controls) in 1983, the Apple Macintosh 128K in 1984, and the Atari ST and Commodore Amiga in 1985. The GUIs familiar to most people today are Microsoft Windows, Mac OS X, and X Window System interfaces for desktop and laptop computers, and Symbian, BlackBerry OS, Android and Apple's iOS for handheld ("smartphone") devices.
WebReference: http://en.wikipedia.org/wiki/Graphical_user_interface

Interaction Description
Interaction is a kind of action that occurs as two or more objects have an effect upon one another. The idea of a two-way effect is essential in the concept of interaction, as opposed to a one-way causal effect. A closely related term is interconnectivity, which deals with the interactions of interactions within systems: combinations of many simple interactions can lead to surprising emergent phenomena. Interaction has different tailored meanings in various sciences. Casual examples of interaction outside of science include:

communication of any sort, for example two or more people talking to each other, or communication among groups, organizations, nations or states: trade, migration, foreign relations, transportation,

feedback during the operation of machines such as a computer or tool, for example the interaction between a driver and the position of his or her car on the road: by steering the driver influences this position, by observation this information returns to the driver.

Web Reference: http://en.wikipedia.org/wiki/Interaction

Testing Description
Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to, the process of executing a program or application with the intent of finding software bugs (errors or other defects). Software testing can be stated as the process of validating and verifying that a software program/application/product:
1. meets the requirements that guided its design and development; 2. works as expected; and 3. can be implemented with the same characteristics.

Software testing, depending on the testing method employed, can be implemented at any time in the development process. However, most of the test effort occurs after the requirements

have been defined and the coding process has been completed. As such, the methodology of the test is governed by the software development methodology adopted. Different software development models will focus the test effort at different points in the development process. Newer development models, such as Agile, often employ test driven development and place an increased portion of the testing in the hands of the developer, before it reaches a formal team of testers. In a more traditional model, most of the test execution occurs after the requirements have been defined and the coding process has been completed. Web Reference : http://en.wikipedia.org/wiki/Software_testing

Event description
Event can refer to many things such as:

An observable occurrence, phenomenon or an extraordinary occurrence

A type of gathering:

A ceremony, for example, a marriage A competition, for example, a sports competition A convention (meeting) A happening, a performance or situation meant to be considered as art A festival, for example, a musical event A media event, a happening that attracts coverage by mass media A party A sporting event

In science, technology, and mathematics:

Event (computing), a software message indicating that something has happened, such as a keystroke or mouse click

Event, Particle accelerator, experiments which produce high energy (Electron volt|MeV, GeV, and TeV) subatomic particle collisions

Event (probability theory), a set of outcomes to which a probability is assigned Event (UML), in Unified Modeling Language, a notable occurrence at a particular point in time Event chain methodology, in project management

Event (relativity), a point in space at an instant in time, i.e. a location in spacetime Event horizon, a boundary in spacetime, typically surrounding a black hole, beyond which events cannot effect an exterior observer

Extinction event, a sharp decrease in the number of species in a short period of time Celestial event, an astronomical phenomenon of interest

Web Reference :http://en.wikipedia.org/wiki/Event

Context Description
The part of a text or statement that surrounds a particular word or passage and determines its meanings. Web Reference: www.thefreedirectionary .com

Incorporating Description
The act of combining into an integral whole a consolidation of two corporations. Web Reference : www.thefreedictionary.com By G. Saritha Reddy Mtech(SE)

On the cost of network inference mechanism Definitions: Network:


computer network means interconnected collection of

autonomous computers. Two computers are said to be interconnected if they nare able to exchange information

Ref: book: computer networks


Author: Andrew s. tanenbaum Edition: third

Inference:

Inference. The word inference is central in statistical analysis. A dictionary definition of inference [150] rephrases to infer as to conclude by reasoning from something known or assumed. A broad definition of statistical inference could be the procedure that involves extracting information from data about the process underlying the observations. Enlarge Image

Ref: Statistical Analysis in Climate Research


ByHansvonStorch UniversittHamburg ByFrancisW.Zwiers University of Victoria, British Columbia Publisher:CambridgeUniversityPress

Print Publication Year: 1984

Web: Network: When you have two or more computers connected to each other, you have a
network. The purpose of a network is to enable the sharing of files and information between multiple systems. The Internet could be described as a global network of networks. Computer networks can be connected through cables, such as Ethernet cables or phone lines, or wirelessly, using wireless networking cards that send and receive data through the air. In information technology, a network is a series of points or nodes interconnected by communication paths. Networks can interconnect with other networks and contain sub networks Ref: http://searchnetworking.techtarget.com/definition/network

Inference:
From Wikipedia, the free encyclopedia Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic. Human inference (i.e. how humans draw conclusions) is traditionally studied within the field of cognitive psychology; artificial intelligence researchers develop automated inference systems to emulate human inference. Statistical inference allows for inference from quantitative data. The process by which a conclusion is inferred from multiple observations is called inductive reasoning. The conclusion may be correct or incorrect, or correct to within a certain degree of accuracy, or correct in certain situations. Conclusions inferred from multiple observations may be tested by additional observations.

Ref: This definition is disputable (due to its lack of clarity. Ref: Oxford English dictionary: "induction ... 3. Logic the inference of a general law from particular instances.") The definition given thus applies only when the "conclusion" is general. DESCRIPTION: NETWORK: The most common topology or general configurations of networks include the bus, star, token ring, and mesh topologies. Networks can also be characterized in terms of spatial distance as local area networks (LANs), metropolitan area networks (MANs), and wide area networks (WANs). A given network can also be characterized by the type of data transmission technology in use on it (for example, a TCP/IP or Systems Network Architecture network); by whether it carries voice, data, or both kinds of signals; by who can use the network (public or private); by the usual nature of its connections (dial-up or switched, dedicated or no switched, or virtual connections); and by the types of physical links (for example, optical fiber, coaxial cable, and Unshielded Twisted Pair). Large telephone networks and networks using their infrastructure (such as the Internet) have sharing and exchange arrangements with other companies so that larger networks are created. INFERENCE:

Inference mechanism
Since Post it has been the accepted practice to define the class of formulas and the notion of proof inductively. Notice our definition of formula in , also, for example, a Hilbert style proof Hilbert style system Hilbert style system is a sequence of closed formulas such that Fi is an axiom or

follows by a rule of inference from Fj, Fk for j<i, k<i. A typical inference rule is expressed in the form of hypotheses above a horizontal line with the conclusion below as in modus poneusmodus ponens.

This definition of a proof includes a specific presentation of evidence that an element is in the class of all proofs. The above form of a rule can be used to present any inductive definition. For example, the natural numbers are often defined inductively by one rule with no premise and another rule with one.

This definition of

is one of the most basic inductive definitions. It is a pattern for all others, and

indeed, it is the clarity of this style of definition that recommends it for foundational work. Making an inference is also known as reading between the lines. The reader must put together the information the author provides and the information that the reader already knows to come up with the answer.

The text + previous knowledge = inference

EXISTING SYSTEM:
A number of network path delay, loss, or bandwidth inference mechanisms have been proposed over the past decade. Concurrently, several network measurement services have been deployed over the Internet and intranets. An important class of network inference mechanisms estimates the properties ( e.g., delay or loss) of a large number of end-to-end network paths by measuring some subset thereof.2 this class of mechanisms is designed to reduce the amount of injected. Active measurement probe traffic and effort required collecting a large set of measurements; usually at the expense of measurement accuracy Measurement of network bandwidth is important for many Intranet applications and protocols, especially those involving the transfer of small and large files and those involving the delivery of content with realtime QoS constraints.

Examples of inference
Incoherence: no definition of deductive inference has been offered. The definition offered was for INDUCTIVE inference. Greek philosophers defined a number of syllogisms, correct three part inferences that can be used as building blocks for more complex reasoning. We begin with the most famous of them all: 1. All men are mortal 2. Socrates is a man 3. Therefore, Socrates is mortal. The reader can check that the premises and conclusion are true, but Logic is concerned with inference: does the truth of the conclusion follow from that of the premises? The validity of an inference depends on the form of the inference. That is, the word "valid" does not refer to the truth of the premises or the conclusion, but rather to the form of the inference. An inference can be valid even if the parts are false, and can be invalid even if the parts are true. But a valid form with true premises will always have a true conclusion. For example, consider the form of the following symbological track: 1. All apples are blue. 2. A banana is an apple. 3. Therefore, a banana is blue. For the conclusion to be necessarily true, the premises need to be true. Now we turn to an invalid form. 1. All A are B. 2. C is a B. 3. Therefore, C is an A.

To show that this form is invalid, we demonstrate how it can lead from true premises to a false conclusion. 1. All apples are fruit. (True) 2. Bananas are fruit. (True) 3. Therefore, bananas are apples. (False) A valid argument with false premises may lead to a false conclusion: 1. All fat people are Greek. 2. John Lennon was fat. 3. Therefore, John Lennon was Greek. When a valid argument is used to derive a false conclusion from false premises, the inference is valid because it follows the form of a correct inference. A valid argument can also be used to derive a true conclusion from false premises: 1. All fat people are musicians 2. John Lennon was fat 3. Therefore, John Lennon was a musician In this case we have two false premises that imply a true conclusion.

Incorrect inference
An incorrect inference is known as a fallacy. Philosophers who study informal logic have compiled large lists of them, and cognitive psychologists have documented many biases in human reasoning that favor incorrect reasoning.

Automatic logical inference


AI systems first provided automated logical inference and these were once extremely popular research topics, leading to industrial applications under the form of expert systems and later business rule engines. An inference system's job is to extend a knowledge base automatically. The knowledge base (KB) is a set of propositions that represent what the system knows about the world. Several techniques can be used by that system to extend KB by means of valid inferences. An additional requirement is that the conclusions the system arrives at are relevant to its task.

Statistical inference
From Wikipedia, the free encyclopedia

In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation. Initial requirements of such a system of procedures for inference and induction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. The outcome of statistical inference may be an answer to the question "what should be done next?", where this might be a decision about making further experiments or surveys, or about drawing a conclusion before implementing some organizational or governmental policy. By Manish kumar MTech(S.E)

Automatic Discovery Of Personal Name Aliases From The Web


Literature Survey Definitions:
Personal Name: A personal name is the proper name identifying an individual person, and usually comprises a given name bestowed at birth or at a young age plus a surname.

Ref: http://en.wikipedia.org/wiki/Pseudonym Alias: An alias is a pseudonym. A pseudonym (literally, "false name") is a name that a person (or, sometimes, a group) assumes for a particular purpose and that differs from his or her original orthonym (or "true name").

Ref: http://en.wikipedia.org/wiki/personal name Snippet: A snippet is a short bit of descriptive text found in nearly every search engine result. Ref: The truth about search engine optimization, Edition 1 2009, Rebecca Lieb

Lexical Patterns: A lexical pattern is defined as a particular combination of part of speech categories. For instance noun, preposition, noun or verb, preposition, noun are lexical patterns. Ref: Lecture Notes in Computer Science,A semantic case-based reasoning framework for text categorization,2007,Valentina Ceausu and Sylvie Desprs ,page no 741 Volume 4825/2007, 736-749, DOI: 10.1007/978-3-540-76298

Description:
Snippet: Snippets are short summary for each search result. Snippet is built from much larger texts found in web pages .The snippets are either complete sentences or fragments of sentences. It allows us to quickly scan and understand its gist or menu lists. Snippets are portions of text clearly intended to be read by a human snippets which are generally complete sentences, coherent excerpts of sentences, or understandable titles. Results presented by early search engines provided a hint as to the content of each page in the form of a fixed snippet from the head of the text. Modern search engines usually present the users a list of snippets instead of documents.

Once a user submits a query, the search engine returns a ranked list of snippets, each of which corresponds to a retrieved document. If the snippet is relevant the user would click and examine the corresponding document. Good snippets can guide the users to find out the relative documents from the retrieved results, or even contain relative information itself. On the contrary users may miss relevant documents or waste time clicking and examining irrelevant documents due to bad snippets. Ref: 1) Effective Time Ratio: A measure for web search engines with document snippets 2) Information retrieval (searching in the 21st century) ,2007,Ayse Goker, John Davies

Lexical Patterns: A lexical pattern is a set of lexical categories. We use lexical patterns to extract information. In the lexicon (vocabulary) of a language, lexical words or nouns refer to things. These words fall into three main classes:

proper nouns refer exclusively to the place, object or person named, i.e. nomenclature or a naming system;

concrete nouns refer to physical objects; and abstract nouns refer to concepts and ideas.

Other than lexical words, the lexicon consists of functional or grammatical words which do not refer to objects in the world.

Lexical pattern is one common type of knowledge pattern, involving one or more specific lexical items. Some of the patterns for hyperonymy include is a, classified as, defined as,; for meronymy, its, is a part of, contains;and other patterns are needed for, serve * as, designed for etc. Pattern restrictions should ensure that the kind of lexical patterns extracted should not generate too much noise. A hyponym is a word or phrase whose semantic field is included within that of another word, its hypernym. In simpler terms, a hyponym shares a typeof relationship with its hypernym. For example, scarlet, vermilion, carmine, and crimson are all hyponyms of red (their hypernym), which is, in turn, a hyponym of color. A meronym denotes a constituent part of, or a member of something. That is, X is a meronym of Y if Xs are parts of Y(s), or X is a meronym of Y if Xs are members of Y(s). For example, 'finger' is a meronym of 'hand' because a finger is part of a hand. Similarly 'wheel' is a meronym of 'automobile'. Ref: 1) Recent Advances in Computational Terminology Extracting knowledge-rich contexts for terminography, 2001, Didier Bourigault, Christian Jacquemin, Marie-Claude L'Homme Chapter14, page 290 2) www.wikipedia.com

by Rama MTech(CSE)

Predicting Missing Items in Shopping Carts Definition:


A prediction or forecast is a statement about the way things will happen in the future, often but not always based on experience or knowledge

REF:http://en.wikipedia.org/wiki/Prediction

Definition:
Use this method to return the properties and values from the current case that are relevant to a prediction of the specified unknown property. A score corresponding to each returned property/value pair explains the importance of each property/value pair to the prediction. REF: http://msdn.microsoft.com/en-US/library/ee784341(v=CS.10).aspx

Definition:
A shopping cart is a piece of software that acts as an online store's catalog and ordering process

REF: http://www.webopedia.com/TERM/S/shopping_cart.html

By Vijay MTech(SE)

On Computing Farthest Dominated Locations Definition:


(Nearest Dominator, Nearest Dominator Distance) Given a location s, its quality vector , and a set of spatial objects P, the nearest dominator of s in P is defined as

Description:

i.e., the nearest neighbor of s in P among those that dominate .The nearest dominator distance ndd(s,, P) of s is then defined as: ndd(s,, P) = dist(s,ND(s,, P)). Refer to the example in Figures 1 and 2. The ND of si is the hotel hj that minimizes the dist(si, hj) value, among those hotels dominating the design competence. Figure 2b lists the NN and ND of each location si. It is important to note that NN is not necessarily the same as ND. For example, the NN of s2 is h4 which however does not dominate s2 with respect to its design competence.Whereas its next nearest neighbor h3 does, which exactly is s2s ND. It is also noteworthy that a locations ND is not necessarily a skyline point, as indicated by h3 here. By considering the distance of each location si from its ND, we pick the largest one (i.e., dist(s3, h5)), and take its location (i.e., s3) as the result location for building the new hotel Reference 1: www4.comp.polyu.edu.hk/~csmlyiu/journal/TKDE_fdl.pdf On Computing Farthest Dominated Locations Hua Lu, Man Lung Yiu

Definition:
(Farthest Dominated Location Query) Given a set of (competitors) spatial objects P, a set of (candidate) locations L, and a quality vector as the design competence, the farthest dominated location query (FDL)1 returns from L a location s such that the distance ndd(s,, P) is maximized, i.e.,

Description: Refer to the hotel example in Figures 1 and 2. There are c = 2 quality attributes (i.e., price and star). The set of objects is P = {h1, h2, , h6} and the set of locations is L = {s1, s2, , s4}. Hotel h1 is a spatial object, with a fixed location in the Euclidean space and the quality vector h1. = (180, 4). Let the design competence be =(200, 4). Location s3 is the farthest dominated location and its nearest dominator is h5.

Reference 2:

Reference 1: www4.comp.polyu.edu.hk/~csmlyiu/journal/TKDE_fdl.pdf

On Computing Farthest Dominated Locations Hua Lu, Man Lung Yiu


By Arun Prashanth MTech(SE)

AUTOMATIC TEMPLATE EXTRACTION FROM HETEROGENEOUS WEBPAGES


Webpage
Definition 1.A web page is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device.
http://en.wikipedia.org/wiki/Web_page

2. A document on the World Wide Web. Every Web page is identified by a unique URL (Uniform Resource Locator). http://webteam.waikato.ac.nz/guidelines/web-templates/definition.shtml Description The World Wide Web is a vast and rapidly growing source of information .Web pages contain a combination of unique content and template material, which is often present across multiple page and used primarily for formatting, navigation, and branding. In order to achieve high productivity of publishing, the web pages in many websites are automatically populated by using common templates with contents. Different web pages can use different models for structuring the data related to that. Web pages may consist of files of static text and other content stored within the web server's file system (static web pages), or may be constructed by server-side software when they are requested (dynamic web pages). Client-side scripting can make web pages more responsive to user input once on the client browser. Whenever a web page is requested by a browser, the server detects the template of a particular web page and displays it on the requested locations. Ref1: http://en.wikipedia.org/wiki/Web_pages

Template
Definition 1. Template is the longest Common Subsequence of HTML elements that can be found in th e web pages grouped inside a same cluster. Web engineering: principles and techniques, 2005, Woojong Suh 2. A web template is a tool used to separate content from presentation in web design, and for mass-production of web documents.

http://en.wikipedia.org/wiki/Web_template

3. Web page template often refers to a predesigned Web page that you can customize. The page template would include font, style, formatting, tables, graphics and other elements commonly found on a Web page.
http://www.webopedia.com/TERM/P/page_template.html

4. The Template controls the overall look and layout of your site. It provides the framework that brings together common elements, modules and components as well as providing the cascading style sheet for your site.
http://docs.joomla.org/What_is_a_template%3F

Description Many web sites contain large collections of pages generated using a common template or layout. Template is a model that is used for displaying the structured information of a webpage. For human beings, A template defines how content should be extracted from a particular webpage and all other web pages with a similar content structure. The templates provide readers easy access to the contents guided by consistent structures even though the templates are not explicitly announced. However, for machines, the unknown templates are considered harmful because they degrade the accuracy and performance due to the irrelevant terms in templates. Various template detection methods can be used for detecting the templates of a web page. Web pages may contain different kind of template structure. we use the html document structure for finding the template information of a particular web page.

Clustering
Definition 1. Cluster analysis or clustering is the task of assigning a set of objects into groups so that the objects in the same cluster are more similar to each other than to those in other clusters.
http://en.wikipedia.org/wiki/Cluster_analysis

2. Clustering is a data mining (machine learning) technique used to place data elements into related groups without advance knowledge of the group definitions.
http://databases.about.com/od/datamining/g/clustering.htm

3. Clustering is a division of data into groups of similar objects. Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups.
http://churmura.com/technology/computer-science/clustering-in-data-mining/31592/

Description The goal of clustering is to separate given group of data items into groups such that the items in the same cluster are similar to each other. Clustering is done by comparing the documents or templates of two web pages. The pages that have similar to one another is added into a cluster and a single common template is assigned for that which is identified by the server. Clustering of web documents is an important problem for two major reasons. First, clustering a document collection into categories enables it to be more easily browsed and used. Automatic categorization is especially important for the world wide web with huge number of dynamic (time varying) documents and diversity of topics: such features make it extremely difficult to classify pages manually as we might do with small documents corpora related to a single field or topic. Second, clustering can improve the performance of search and retrieval on a document collection. Automatically clustering web pages into semantic groups promises improved search and browsing on the web. A web document clustering algorithm partitions a set of web documents into groups of similar documents.
Ref1: Web document analysis: challenges and opportunities, Apostolos Antonacopoulos, Jianying Hu

BY, Pravallika Mtech (CSE)

Вам также может понравиться