Вы находитесь на странице: 1из 24

WEB USER -SESSION TERFERENCE BY MEANS

INTRODUCTION

The explosive growth of the Web has drastically changed the way in which
information is managed and accessed. The large-scale of Web data1 sources and the wide
availability of services over the Internet have increased the need for effective Web data
management techniques and mechanisms. Understanding how users navigate over Web
sources is essential both for computing practitioners and researchers. In this context, Web
data clustering has been widely used for increasing Web information accessibility,
understanding users’ navigation behavior, improving information retrieval and content
delivery on the Web.
Computer user with an enormous flood of information. To almost any topic one
can think of, one can find pieces of information that are made available by other internet
citizens, ranging from individual users that post an inventory of their record collection, to
major companies that do business over the Web. To be able to cope with the abundance
of available information, users of the WWW need to rely on intelligent tools that assist
them in finding, sorting, and filtering the available information. Just as data mining aims
at discovering valuable information that is hidden in conventional databases, the
emerging field of Web mining aims at finding and extracting relevant information that is
hidden in Web-related data, in particular in text documents that are published on the
Web. Like data mining, Web mining is a multi-disciplinary effort that draws techniques
from fields like information retrieval, statistics, and machine learning, natural language
processing, and others.
The World Wide Web has become increasingly important as a medium for
commerce as well as for dissemination of information. In E-commerce, companies want
to analyze the user’s preferences to place advertisements, to decide their market strategy,
and to provide customized guide to Web customers. In today’s information based society,
there is an urge for Web surfers to find the needed information from the overwhelming
resources on the Internet. Web access log contains a lot of information that allows us to
observe user’s interest with the site. Properly exploited, this information can assist us to
make improvements to the Web site, create a more effective Web site organization and to
help users navigate through enormous Web documents. Therefore, data mining, which is
referred to as knowledge discovery in database (KDD), has been naturally introduced to
the World Wide Web.
Web usage data collected in access log is at a very fine granularity. It usually
includes every HTTP request from all users. Each request contains at least the IP address,
requested pages, time requested, response code, and size of the item requested. Therefore,
while the access log has the advantage of being extremely detailed, it also has some
drawbacks. When we apply statistical and probability methods to it, we tend to get results
that are too refined than it should be because the analysis might focus on micro trends
rather than macro trends. However, based on our observation, user’s browsing behavior
on the Web is highly uncertain. Users might browse the same page for different purposes,
spend various amounts of time on the same page or make different number of visits on it,
or even get to the page from different sources each time. Therefore, micro trends tend to
be erroneous and not of much use.

Depending on the nature of the data, one can distinguish three main areas of
research within the Web mining community:
Web Content Mining: application of data mining techniques to unstructured or semi
structured data, usually HTML-documents
Web Structure Mining: use of the hyperlink structure of the Web as an (additional)
information source
Web Usage Mining: analysis of user interactions with a Web server (e.g., click stream
analysis)

AN OVERVIEW OF WEB USAGE MINING

Web Usage Mining Process:


The main processes in Web Usage Mining are:
Preprocessing: Data preprocessing describes any type of processing performed on
raw data to prepare it for another processing procedure. Commonly used as a preliminary
data mining practice, data preprocessing transforms the data into a format that will be
more easily and effectively processed for the purpose of the user.
The different types of preprocessing in Web Usage Mining are:
1. Usage Pre-Processing: Pre- Processing relating to Usage patterns of users.
2. Content Pre-Processing: Pre- Processing of content accessed.
3. Structure Pre-Processing: Pre- Processing related to structure of the website.

Pattern Discovery: Web Usage mining can be used to uncover patterns in server
logs but is often carried out only on samples of data. The mining process will be
ineffective if the samples are not a good representation of the larger body of data.

The following are the pattern discovery methods.


1. Statistical Analysis
2. Association Rules
3. Clustering
4. Classification
5. Sequential Patterns
6. Dependency Modeling
Recently, data mining techniques have been applied to extract usage patterns from
Web log data. This process, known as Web usage mining, is traditionally performed in
several stages to achieve its goals:
1. Collection of Web data such as activities/click streams recorded in Web server
logs,
2. Preprocessing of Web data such as filtering crawlers requests, requests to
graphics, and identifying unique sessions,
3. Analysis of Web data, also known as Web Usage Mining , to discover
interesting usage patterns or profiles.
4. Interpretation/evaluation of the discovered profiles. In this paper , further added
a fifth step after a repetitive application of steps 1-4 on multiple time periods.
5. Tracking the evolution of the discovered profiles.

Web usage mining can use various data mining or machine learning techniques to
model and understand Web user activity. In clustering was used to segment user sessions
into clusters or profiles that can later form the basis for personalization. Inthe notion of an
adaptive Web site was proposed, where the user’s access pattern can be used to
automatically synthesize index pages. The work in is based on using association rule
discovery as the basis for modeling Web user activity, whereas the approach proposed in
used probabilistic grammars to model Web navigation patterns for the purpose of
prediction. The approach in proposed building data cubes from Web log data and later
applying Online Analytical Processing (OLAP) and data mining on the cube model. Web
Utilization Miner (WUM) was presented to discover navigation patterns with user
specified characteristics over an aggregated materialized view of the Web log, consisting
of a tire of sequences of Web views. Web usage have recently become important. This is
because Web access patterns on a Web site are dynamic due not only to the dynamics of
Web site content and structure but also to changes in the user’s interests and, thus, their
navigation patterns.
In order to create the user's groups, i.e., user profiles only have access to the user's
browsing history. Assume that users with similar browsing patterns upon a point in time
should have similar interests and motivations. User profiles can be created through online
web usage mining, which consists in discovering web usage patterns to better understand
the users' behavior. For the task of creating user groups based on the web server access
logs, web usage mining uses clustering techniques. During a visit to a web site, the users'
requests are registered in a web server access log format stored in the web server.
Therefore, the web server access logs provide the means to create a data set prepared for
the application of clustering algorithms. User's interests are affected by the temporal
context, thus in some research work instead of creating user clusters it is presented the
concept of session clustering. A session comprises the browsing history of small time
windows, usually 30 minutes. Therefore by clustering sessions it is easier to comprehend
the contextual motivations of each user and provide ads suitable for current user's
interests. Propose to create user's profiles in a two stage process. First, one creates session
clusters, then creates users clusters based on common sessions between users. This work
is going to focus on the first part, create session clusters. It compare the resulting session
clusters by using different attributes to describe a session. Our approach for representing
a session consists in combining descriptions extracted from the URLs of the pages visited
with temporal frames based on date, such as Monday morning.

Session Identification
A session is a list of web pages accesses from a given user during a period of time.
Each access is registered in a line of the web server access log. For the task of identifying
the list of web pages visited during a user's session it is necessary to clean all the
information contained in the web server access logs that is meaningless or not relevant.
Though, browser and proxy caching represent a major drawback to the creation of a
reliable user session data set. The web server access log is a text file that contains all the
requests made to the web server, and usually they are in a Common Log Format , which
means that it contains the following fields:
_ IP address or domain name
_ User ID
_ Date and time of the request
_ HTTP request (including method and page requested)
_ Status code response to the request
_ File size
_ Referrer (web page that contain the hyperlink that originated the request)
_ Web agent (user's browser)
The web server access logs used during this work contain accesses to web pages
from several web sites and in this case, the URL of the web pages is in the referrer. There
is also extra information about the request such as a session cookie and a long duration
cookie. The session cookie identifies a 30 minutes session and the long duration cookie
identifies a user. Therefore, only web server access log entries containing the session
cookie were considered. From these entries the web page URL (referrer), date and
session cookie are the meaningful data for the purpose of this work. Thereafter, these
parameters were grouped by common session cookie in order to create each session
representation vector.
Among them, clustering allows us to group together clients or data items that have
similar characteristics. The information discovered by this technique is one of the most
important types that has a wide range of applications from real-time personalization to
link prediction. It can facilitate the development of future marketing strategies, such as
automated return mail, present advertisements to clients falling within a certain cluster, or
dynamically changing a particular site for a client on a return visit based on past
classification of that client. The key problem lies in how we effectively discover clusters
of Web pages or users with common interest. Clustering analysis to mine the Web is
quite different from traditional clustering due to the inherent difference between Web
usage data clustering and classic clustering. Therefore, there is a need to develop
specialized techniques for clustering analysis based on Web usage data. Some approaches
to clustering analysis have been developed for mining the Web access logs.
Session Clustering
Thereafter the transformation of user sessions into a multi-dimensional space
as vectors of extracted attributes, clustering algorithms can partition this space into group
of sessions. Each session within a group has a close distance between the others in the
group, based on a distance measure. Regarding the clustering algorithms both model-
based and similarity-based are used to group users or sessions, as well as, hierarchical
and partitional techniques. The most common model-based algorithm is the Expectation-
maximization (EM) algorithm which has been used to identify associations among users
and pages as well to provide user profiles.
The proposed methodology is applied on Web users’ navigation patterns by a
model-based approach employing:
Cluster validation, i.e. evaluation of the results of a clustering algorithm in a
quantitative and objective manner. We propose a quantitative validation procedure, which
is based on the statistical chi-square (v2) test. Each cluster is represented by a probability
distribution and the chi-square metric is used to measure the distances between these
distributions and to test their homogeneity. Since the goal of a clustering procedure is to
discover groups in the data so that each group is significantly different from all the
others, It essentially test the heterogeneity between the clusters in order to assess their
successful discrimination.
Cluster interpretation, i.e. understanding and appropriately interpreting the
meaning of the derived clusters in the wider context of the underlying application, by
using statistical data analysis. Specifically, propose a visualization approach as a result
of the statistical method known as correspondence analysis, for interpreting
the clustering results. This analysis is used to facilitate revealing of similar or related
features in Web users’ navigation behavior and their interaction with the content of Web
information sources.
Clustering evaluation may be employed under three different views:
1. External view: when results of a clustering method are evaluated on the basis of
a pre-specified structure on a data set, which reflects a user’s intuition about the
clustering structure of this data set.
2. Internal view: clustering results are evaluated in terms of quantities obtained
from the data set itself.
3. Relative view: clustering result is compared with other clustering schemes, by
modifying only the parameter values.

System Architecture:
Trace user session
details

Clustering Selection

Web user
Session
Clustering Hierarchical
Tech Agglomerative

Clustering Creation
Modules

1. Trace user details

2. Clustering selection

3. Hierarchical Agglomerative

4. Clustering Creation

1. Trace user Details

In this module we trace user session details. Web user is identified by its client IP
address and by connections having TCP server port equal to 80 (HTTP protocol). Each
user trace, i.e., a trace containing only data with a given IP source address, is
preprocessed according to the following steps: i) data are partitioned day by day, ii) only
working hours of working days are considered, and iii) opening times of two consecutive
connections separated by more than half an hour are considered a priori as two
independent data sets.

2. Clustering Selection

In this module we read user session details .Using K means clustering initially cluster
the user session details.

3. Hierarchical Agglomerative:

In this module a hierarchical agglomerative algorithm is iteratively run, using only


the representative samples to evaluate the distance between two clusters. Since the
procedure starts with initial clusters, the number of steps is bounded. At each step, the
hierarchical agglomerative procedure merges the two closest clusters; then, distances
among clusters are recomputed. After iterations, the process ends.
4. Clustering Creation

A partition clustering procedure is run over the original data set, which includes all
samples using the optimal number of clusters determined so far and the same choice of
cluster representatives adopted in the first step. A fixed number of iterations is run to
obtain a final refinement of the clustering.
LANGAUGE SPECIFICATION

4.1 FEATURES OF. NET

Microsoft .NET is a set of Microsoft software

technologies for rapidly building and integrating XML Web

services, Microsoft Windows-based applications, and Web

solutions. The .NET Framework is a language-neutral platform for

writing programs that can easily and securely interoperate.

There’s no language barrier with .NET: there are numerous

languages available to the developer including Managed C++,

C#, Visual Basic and Java Script. The .NET framework provides the

foundation for components to interact seamlessly, whether locally

or remotely on different platforms. It standardizes common data

types and communications protocols so that components created

in different languages can easily interoperate.

“.NET” is also the collective name given to various

software components built upon the .NET platform. These will be

both products (Visual Studio.NET and Windows.NET Server, for


instance) and services (like Passport, .NET My Services, and so

on).

THE .NET FRAMEWORK

The .NET Framework has two main parts:

1. The Common Language Runtime (CLR).

2. A hierarchical set of class libraries.

The CLR is described as the “execution engine” of .NET. It

provides the environment within which programs run. The most

important features are


♦ Conversion from a low-level assembler-style language,

called Intermediate Language (IL), into code native to

the platform being executed on.

♦ Memory management, notably including garbage

collection.

♦ Checking and enforcing security restrictions on the

running code.

♦ Loading and executing programs, with version control

and other such features.

♦ The following features of the .NET framework are also

worth description:

Managed Code

The code that targets .NET, and which contains certain

extra

Information - “metadata” - to describe itself. Whilst both managed

and unmanaged code can run in the runtime, only managed code

contains the information that allows the CLR to guarantee, for

instance, safe execution and interoperability.


Managed Data

With Managed Code comes Managed Data. CLR

provides memory allocation and Deal location facilities, and

garbage collection. Some .NET languages use Managed Data by

default, such as C#, Visual Basic.NET and JScript.NET, whereas

others, namely C++, do not. Targeting CLR can, depending on the

language you’re using, impose certain constraints on the features

available. As with managed and unmanaged code, one can have

both managed and unmanaged data in .NET applications - data

that doesn’t get garbage collected but instead is looked after by

unmanaged code.

Common Type System

The CLR uses something called the Common Type System

(CTS) to strictly enforce type-safety. This ensures that all classes

are compatible with each other, by describing types in a common


way. CTS define how types work within the runtime, which

enables types in one language to interoperate with types in

another language, including cross-language exception handling.

As well as ensuring that types are only used in appropriate ways,

the runtime also ensures that code doesn’t attempt to access

memory that hasn’t been allocated to it.

Common Language Specification

The CLR provides built-in support for language

interoperability. To ensure that you can develop managed code

that can be fully used by developers using any programming

language, a set of language features and rules for using them

called the Common Language Specification (CLS) has been

defined. Components that follow these rules and expose only CLS

features are considered CLS-compliant.


THE CLASS LIBRARY

.NET provides a single-rooted hierarchy of classes,

containing over 7000 types. The root of the namespace is called

System; this contains basic types like Byte, Double, Boolean, and

String, as well as Object. All objects derive from System. Object.

As well as objects, there are value types. Value types can be

allocated on the stack, which can provide useful flexibility. There

are also efficient means of converting value types to object types

if and when necessary.

The set of classes is pretty comprehensive, providing

collections, file, screen, and network I/O, threading, and so on, as

well as XML and database connectivity.

The class library is subdivided into a number of sets (or

namespaces), each providing distinct areas of functionality, with

dependencies between the namespaces kept to a minimum.


LANGUAGES SUPPORTED BY .NET

The multi-language capability of the .NET Framework

and Visual Studio .NET enables developers to use their existing

programming skills to build all types of applications and XML Web

services. The .NET framework supports new versions of

Microsoft’s old favorites Visual Basic and C++ (as VB.NET and

Managed C++), but there are also a number of new additions to

the family.

Visual Basic .NET has been updated to include many

new and improved language features that make it a powerful

object-oriented programming language. These features include

inheritance, interfaces, and overloading, among others. Visual

Basic also now supports structured exception handling, custom

attributes and also supports multi-threading.


Visual Basic .NET is also CLS compliant, which means

that any CLS-compliant language can use the classes, objects,

and components you create in Visual Basic .NET.

Managed Extensions for C++ and attributed

programming are just some of the enhancements made to the C+

+ language. Managed Extensions simplify the task of migrating

existing C++ applications to the new .NET Framework.

C# is Microsoft’s new language. It’s a C-style language

that is essentially “C++ for Rapid Application Development”.

Unlike other languages, its specification is just the grammar of

the language. It has no standard library of its own, and instead

has been designed with the intention of using the .NET libraries as

its own.

Microsoft Visual J# .NET provides the easiest transition

for Java-language developers into the world of XML Web Services

and dramatically improves the interoperability of Java-language


programs with existing software written in a variety of other

programming languages.

Active State has created Visual Perl and Visual Python,

which enable .NET-aware applications to be built in either Perl or

Python. Both products can be integrated into the Visual Studio

.NET environment. Visual Perl includes support for Active State’s

Perl Dev Kit.

Other languages for which .NET compilers are available include

• FORTRAN

• COBOL
• Eiffel

Fig1 .Net Framework

ASP.NET Windows

Forms
XML WEB

SERVICES
Base Class Libraries
Common Language Runtime
Operating System

4.2 FEATURES OF C#. NET


C#.NET is also compliant with CLS (Common Language Specification) and

supports structured exception handling. CLS is set of rules and constructs that

are supported by the CLR (Common Language Runtime). CLR is the runtime

environment provided by the .NET Framework; it manages the execution of the

code and also makes the development process easier by providing services.

C#.NET is a CLS-compliant language. Any objects, classes, or components that

created in C#.NET can be used in any other CLS-compliant language. In

addition, we can use objects, classes, and components created in other CLS-

compliant languages in C#.NET .The use of CLS ensures complete

interoperability among applications, regardless of the languages used to create

the application.

CONSTRUCTORS AND DESTRUCTORS:

Constructors are used to initialize objects, whereas destructors are used to

destroy them. In other words, destructors are used to release the resources
allocated to the object. In C#.NET the sub finalize procedure is available. The

sub finalize procedure is used to complete the tasks that must be performed

when an object is destroyed. The sub finalize procedure is called automatically

when an object is destroyed. In addition, the sub finalize procedure can be

called only from the class it belongs to or from derived classes.

GARBAGE COLLECTION

Garbage Collection is another new feature in C#.NET. The .NET Framework

monitors allocated resources, such as objects and variables. In addition, the

.NET Framework automatically releases memory for reuse by destroying

objects that are no longer in use.

In C#.NET, the garbage collector checks for the objects that are not currently in

use by applications. When the garbage collector comes across an object that is

marked for garbage collection, it releases the memory occupied by the object.

OVERLOADING

Overloading is another feature in C#. Overloading enables us to define multiple

procedures with the same name, where each procedure has a different set of

arguments. Besides using overloading for procedures, we can use it for

constructors and properties in a class.


MULTITHREADING:

C#.NET also supports multithreading. An application that supports

multithreading can handle multiple tasks simultaneously, we can use

multithreading to decrease the time taken by an application to respond to user

interaction.

STRUCTURED EXCEPTION HANDLING

C#.NET supports structured handling, which enables us to detect and

remove errors at runtime. In C#.NET, we need to use Try…Catch…Finally

statements to create exception handlers. Using Try…Catch…Finally statements,

we can create robust and effective exception handlers to improve the

performance of our application.

THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies

application development in the highly distributed environment of the Internet.

OBJECTIVES OF. NET FRAMEWORK


1. To provide a consistent object-oriented programming environment whether

object codes is stored and executed locally on Internet-distributed, or executed

remotely.

2. To provide a code-execution environment to minimizes software deployment

and guarantees safe execution of code.

3. Eliminates the performance problems.

There are different types of application, such as Windows-based applications

and Web-based applications.

Вам также может понравиться