Вы находитесь на странице: 1из 83

BARKATULLAH UNIVERSITY INSTITUTE OF

TECHNOLOGY
BARKATULLAH UNIVERSITY, BHOPAL

Department of Computer Science & Engineering

PROJECT REPORT
ON

FREQUENT USER ACCESS PATTERN

Submitted for the fulfillment of the requirement for the award of the Degree
Bachelor of Engineering (B.E.)
2009-2010
of Barkatullah University, Bhopal
By

ANIMESH ATUL DINESH MAHESH SANJAY


DUBEY RATHORE AHIRWAR PATIL KASDE

Under the Guidance


Of
Mr. Amit Baghel

Page | 1
BARKATULLAH UNIVERSITY INSTITUTE OF
TECHNOLOGY
BARKATULLAH UNIVERSITY, BHOPAL

Department of Computer Science & Engineering

CERTIFICATE

YEAR 2009-2010

This is to certify that Animesh Dubey, Atul Rathore, Dinesh Ahirwar, Mahesh Patil,
Sanjay Kasde, students of final year of B. E. Computer Science and Engineering Branch
have successfully completed this project work titled FREQUENT USER ACCESS
PATTERN in fulfillment for award of degree in Bachelor of Engineering (Computer
Science&Engineering) in the year 2009-2010 by Barkatullah University, Bhopal.

Mr. Amit Baghel Mr. Divakar Singh Dr. R. K. Pandey


(Project Guide & Coordinator) (Head of the Department (Director)
Computer Sc. & Engg.)

Page | 2
BARKATULLAH UNIVERSITY INSTITUTE OF
TECHNOLOGY
BARKATULLAH UNIVERSITY, BHOPAL

Department of Computer Science & Engineering

DECLARATION
YEAR 2009-2010

This project report entitled FREQUENT USER ACCESS PATTERN submitted in the
fulfillment of the Bachelor of Engineering Degree in Computer Science & Engineering of
Barkatullah University, Bhopal during the year 2009-2010 is an authentic record of our
project work. To the best of our knowledge this is our original work and was not submitted
earlier for award of any other degree, diploma or certificate.

ANIMESH ATUL DINESH MAHESH SANJAY


DUBEY RATHORE AHIRWAR PATIL KASDE

Page | 3
ACKNOWLEDGEMENT

We wish to acknowledge our profound sense of gratitude to our project guide


Mr. AMIT BAGHEL, Department of Computer Science and Engineering,
Barkatullah University Institute of Technology, Barkatullah University, Bhopal for
their remarkable guidance and continued encouragement during the preparation of this
project. Indeed it was a matter of great felicity and privilege for us to work under their
aegis. We express our thankfulness to them for their dedicated inspiration, lively interest
and patience through our errors, without which it would have been impossible to bring the
project to near completion.

We also wish to thank Mr. DIVAKAR SINGH, Head of Department, Computer


Science and Engineering, Barkatullah University Institute of Technology, Barkatullah
University who directly or indirectly helped us during the course of this project.

We would like to thank DR. R.K.PANDEY, Director, Barkatullah University


Institute of Technology, Barkatullah University Bhopal for all the encouragement and
facilities provided to us. Last but not the least we would like to thank our parents for their
support and cooperation.

Regardless of the source we wish to express our gratitude to those who may have
contributed to this work, even though anonymously.

ANIMESH DUBEY
ATUL RATHORE
DINESH AHIRWAR
MAHESH PATIL
SANJAY KASDE

Page | 4
Table Of Contents
Page no.

1. PROJECT
1.1 Introduction 7
1.2 Objectives 8
1.3 Technical description of the project. 9
1.4 Algorithm 36
1.5 Scope 46

2. HARDWARE & SOFTWARE REQUIREMENTS


2.1 Hardware Requirements. 51
2.2 Software Requirements. 51

3. SYSTEM ANALYSIS & DESIGN


3.1 Requirement Analysis 53
3.2 Feasibility 56

4. SOFTWARE SELECTION CRITERIA


4.1 Data Model 58
4.2 Data Flow Diagram 63

5. TABLES 64

6. OUTPUT 68

7. LIMITATIONS 78

8. BIBLIOGRAPHY 80

Page | 5
PROJECT

Page | 6
INTRODUCTION

The main aim of this Project is to demonstrate the frequent access patterns of the website
userswhile they access the site.The patterns thus generated are analysed and access patterns
are generated on the basis of FAP ALGORITHM.These patterns thus generated are fed to the
FAP TREE which generates the Frequent Pattern Tree. This Project proposes a method that can
discover users frequent access patterns underlying users browsing web behaviors. It
usesconcept of access pattern according to a users access path, and secondly puts forward a
revised algorithm (FAP-Mining) based on the FP-tree algorithm to mine frequent access patterns.
This algorithm first constructs a frequent access pattern tree and then mines users frequent
access patterns on the tree.

The tree is then used to Cash the frequently accsesd pages according to the occurrence in the
tree to enhance the space efficiency in cashing process.

Page | 7
OBJECTIVE

This paper proposes a method that can discover users frequent access patterns underlying
users browsing web behaviors. Firstly, the paper introduces the concept of access pattern
according to a users access path, and secondly puts forward a revised algorithm (FAP-Mining)
based on the FP-tree algorithm to mine frequent access patterns. The new algorithm first
constructs a frequent access pattern tree and then mines users frequent access patterns on the
tree. Whats more, the algorithm is accurate and scalable for mining frequent access patterns
with different lengths.

The main objective of the project is to find out the frequent access pattern followed by the
website users and therby help the website owner to modify the contents of his website

1. Mining the web access log records to discover different access patterns of different
users.
2. Improve the link structure of web pages, enhance web server performance and provide
characteristic service for customers.
3. Preprocessing web server log data to get user access paths from user session files.
4. Mining frequent access patterns with different lengths.
5. Cashing the Frequent pages of the Website to increase the sever performance.
performance.

All these objectives are well covered in our project and hence is a comprehensive tool to
facilitate the working and proceduring of the section of academics section pertaining to
students.

Page | 8
TECHNICAL DESCRIPTION

FRONT END

Page | 9
ASP . NET 2.0
VISUAL STUDIO 2005
C#

ASP.NET
It is a web application framework developed and marketed by Microsoft to allow programmers
to build dynamic web sites, web applications and web services. It was first released in January
2002 with version 1.0 of the .NET Framework, and is the successor to Microsoft's Active Server
Pages (ASP) technology. ASP.NET is built on the Common Language Runtime (CLR), allowing
programmers to write ASP.NET code using any supported .NET language. The ASP.NET SOAP
extension framework allows ASP.NET components to process SOAP messages.

History

After the release of Internet Information Services 4.0 in 1997, Microsoft began researching
possibilities for a new web application model that would solve common complaints about ASP,
especially with regard to separation of presentation and content and being able to write "clean"
code.[1] Mark Anders, a manager on the IIS team, and Scott Guthrie, who had joined Microsoft in
1997 after graduating from Duke University, were tasked with determining what that model would
look like. The initial design was developed over the course of two months by Anders and Guthrie,
and Guthrie coded the initial prototypes during the Christmas holidays in 1997.

The initial prototype was called "XSP"; Guthrie explained in a 2007 interview that,
"People would always ask what the X stood for. At the time it really didn't stand for anything. XML
started with that; XSLT started with that. Everything cool seemed to start with an X, so that's what
we originally named it." The initial prototype of XSP was done using Java,[3] but it was soon
decided to build the new platform on top of the Common Language Runtime (CLR), as it offered
an object-oriented programming environment, garbage collection and other features that were
seen as desirable features that Microsoft's Component Object Model platform didn't support.
Guthrie described this decision as a "huge risk", as the success of their new web development
platform would be tied to the success of the CLR, which, like XSP, was still in the early stages of
development, so much so that the XSP team was the first team at Microsoft to target the CLR.

Page | 10
With the move to the Common Language Runtime, XSP was re-implemented in C#
(known internally as "Project Cool" but kept secret from the public), and the name changed to
ASP+, as by this point the new platform was seen as being the successor to Active Server
Pages, and the intention was to provide an easy migration path for ASP developers.[4]

Mark Anders first demonstrated ASP+ at the ASP Connections conference in Phoenix, Arizona
on May 2, 2000. Demonstrations to the wide public and initial beta release of ASP+ (and the rest
of the .NET Framework) came at the 2000 Professional Developers Conference on July 11, 2000
in Orlando, Florida. During Bill Gates' keynote presentation, Fujitsu demonstrated ASP+ being
used in conjunction with COBOL,[5] and support for a variety of other languages was announced,
including Microsoft's new Visual Basic .NET and C# languages, as well as Python and Perl
support by way of interoperability tools created by ActiveState.

Once the ".NET" branding was decided on in the second half of 2000, it was decided
to rename ASP+ to ASP.NET. Mark Anders explained on an appearance on The MSDN Show
that year that, "The .NET initiative is really about a number of factors, it's about delivering
software as a service, it's about XML and web services and really enhancing the Internet in terms
of what it can do ... we really wanted to bring its name more in line with the rest of the platform
pieces that make up the .NET framework."

After four years of development, and a series of beta releases in 2000 and 2001,
ASP.NET 1.0 was released on January 5, 2002 as part of version 1.0 of the .NET Framework.
Even prior to the release, dozens of books had been written about ASP.NET, and Microsoft
promoted it heavily as part of their platform for web services. Guthrie became the product unit
manager for ASP.NET, and development continued apace, with version 1.1 being released on
April 24, 2003 as a part of Windows Server 2003. This release focused on improving ASP.NET's
support for mobile devices.

Characteristics

Pages

.NET pages, known officially as "web forms", are the main building block for application
development.[8] Web forms are contained in files with an ".aspx" extension; these files typically

Page | 11
contain static (X)HTML markup, as well as markup defining server-side Web Controls and User
Controls where the developers place all the required static and dynamic content for the web
page. Additionally, dynamic code which runs on the server can be placed in a page within a block
<% -- dynamic code -- %> which is similar to other web development technologies such as PHP, JSP,
and ASP, but this practice is generally discouraged except for the purposes of data binding since
it requires more calls when rendering the page.

Note that this sample uses code "inline", as opposed to code-behind.

<%@ Page Language="C#" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<script runat="server">

protected void Page_Load(object sender, EventArgs e)

{
Label1.Text = DateTime.Now.ToLongTimeString();
}

</script>

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title>Sample page</title>
</head>
<body>
<form id="form1" runat="server">
<div>
The current time is: <asp:Label runat="server" id="Label1" />
</div>
</form>

</body>
</html>

Page | 12
Code-behind model

Microsoft recommends dealing with dynamic program code by using the code-behind model,
which places this code in a separate file or in a specially designated script tag. Code-behind files
typically have names like MyPage.aspx.cs or MyPage.aspx.vb while the page file is
MyPage.aspx (same filename as the page file (ASPX), but with the final extension denoting the
page language). This practice is automatic in Microsoft Visual Studio and other IDEs. When using
this style of programming, the developer writes code to respond to different events, like the page
being loaded, or a control being clicked, rather than a procedural walk through the document.

ASP.NET's code-behind model marks a departure from Classic ASP in that it encourages
developers to build applications with separation of presentation and content in mind. In theory,
this would allow a web designer, for example, to focus on the design markup with less potential
for disturbing the programming code that drives it. This is similar to the separation of the
controller from the view in model-view-controller frameworks.

Example
<%@ Page Language="C#" CodeFile="SampleCodeBehind.aspx.cs" Inherits="Website.SampleCodeBehind"
AutoEventWireup="true" %>

The above tag is placed at the beginning of the ASPX file. The CodeFile property of the
@ Page directive specifies the file (.cs or .vb) acting as the code-behind while the Inherits
property specifies the Class the Page derives from. In this example, the @ Page directive is
included in SampleCodeBehind.aspx, then SampleCodeBehind.aspx.cs acts as the code-behind
for this page:

using System;

namespace Website
{
public partial class SampleCodeBehind : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
Response.Write("Hello, world");
}
}

Page | 13
}

In this case, the Page_Load() method is called every time the ASPX page is requested.
The programmer can implement event handlers at several stages of the page execution process
to perform processing.

User controls

An event bubbling mechanism provides the ability to pass an event fired by a user control up to
its containing page.

Custom Controls

Programmers can also build Custom controls for ASP.NET applications. Unlike User controls,
these controls don't have an ASCX markup-file, having all their code compiled into a DLL-file.
Such Custom controls can be used across multiple web-applications and Visual Studio projects
(which is not allowed with User controls). By using a Register directive, the control is loaded from
the DLL.

Rendering technique

ASP.NET uses a visited composites rendering technique. During compilation, the template
(.aspx) file is compiled into initialization code which builds a control tree (the composite)
representing the original template. Literal text goes into instances of the Literal control class, and
server controls are represented by instances of a specific control class. The initialization code is
combined with user-written code (usually by the assembly of multiple partial classes) and results
in a class specific for the page. The page doubles as the root of the control tree.

Actual requests for the page are processed through a number of steps. First, during the
initialization steps, an instance of the page class is created and the initialization code is executed.
This produces the initial control tree which is now typically manipulated by the methods of the
page in the following steps. As each node in the tree is a control represented as an instance of a
class, the code may change the tree structure as well as manipulate the properties/methods of
the individual nodes. Finally, during the rendering step a visitor is used to visit every node in the
tree, asking each node to render itself using the methods of the visitor. The resulting HTML
output is sent to the client.

Page | 14
After the request has been processed, the instance of the page class is discarded and with it the
entire control tree. This is usually a source of confusion among novice ASP.NET programmers
that rely on class instance members that are lost with every page request/response cycle.

State management

ASP.NET applications are hosted by a web server and are accessed using the stateless HTTP
protocol. As such, if an application uses stateful interaction, it has to implement state
management on its own. ASP.NET provides various functions for state management.
Conceptually, Microsoft treats "state" as GUI state; problems may arise if an application needs to
keep track of "data state", for example, a finite state machine which may be in a transient state
between requests (lazy evaluation) or which takes a long time to initialize.

Application state

Application state is held by a collection of shared user-defined variables. These are set and
initialized when the Application_OnStart event fires on the loading of the first instance of the
applications and are available till the last instance exits. Application state variables are accessed
using the Applications collection, which provides a wrapper for the application state variables.
Application state variables are identified by name.

Session state

Server-side session state is held by a collection of user-defined session variables, which are
persisted during a user session. These variables, accessed using the Session collection, are
unique to each session instance. The variables can be set to be automatically destroyed after a
defined time of inactivity, even if the session does not end. Client-side user session is maintained
by either a cookie or by encoding the session ID in the URL itself.

ASP.NET supports three modes of persistence for session variables:

In Process Mode
When using In Process Mode, the session variables are maintained within the ASP.NET
process. This is the fastest way; however, in this mode the variables are destroyed when
the ASP.NET process is recycled or shut down.
ASPState Mode

Page | 15
In this mode, ASP.NET runs a separate Windows service that maintains the state
variables. As state management happens outside the ASP.NET process and .NET
Remoting must be utilized by the ASP.NET engine to access the data there is a negative
impact on performance compared to the In Process mode. This mode allows an ASP.NET
application to be load-balanced and scaled across multiple servers and because the state
management service runs independently of ASP.NET, the session variables can persist
across ASP.NET process shutdowns. However, since session state server runs as a
single instance it is still a single point of failure so far as session state is concerned. The
session-state service cannot be load balanced and there are restrictions on types that can
be stored in a session variable.
SqlServer Mode
In this mode, the state variables are stored in a database allowing session variables to be
persisted across ASP.NET process shutdowns. The main advantage of this mode is that it
allows the application to balance load on a server cluster, sharing sessions between
servers. This is the slowest method of session state management in ASP.NET.

View state

View state refers to the page-level state management mechanism, utilized by the HTML pages
emitted by ASP.NET applications to maintain the state of the web form controls and widgets. The
state of the controls is encoded and sent to the server at every form submission in a hidden field
known as __VIEWSTATE. The server sends back the variable so that when the page is re-
rendered, the controls render at their last state. At the server side, the application may change
the viewstate, if the processing requires a change of state of any control. The states of individual
controls are decoded at the server, and are available for use in ASP.NET pages using the
ViewState collection.

The main use for this is to preserve form information across postbacks. View state is turned on
by default and normally serializes the data in every control on the page regardless of whether it is
actually used during a postback. This behavior can (and should) be modified, however, as View
state can be disabled on a per-control, per-page, or server-wide basis.

Developers need to be wary of storing sensitive or private information in the View state of a page
or control, as the base64 string containing the view state data can easily be de-serialized. By

Page | 16
default, View state does not encrypt the __VIEWSTATE value. Encryption can be enabled on a
server-wide (and server-specific) basis, allowing for a certain level of security to be maintained.

Server-side caching

ASP.NET offers a "Cache" object that is shared across the application and can also be used to
store various objects. The "Cache" object holds the data only for a specified amount of time and
is automatically cleaned after the session time-limit elapses.

Template engine

When first released, ASP.NET lacked a template engine. Because the .NET framework is object-
oriented and allows for inheritance, many developers would define a new base class that inherits
from "System.Web.UI.Page", write methods here that render HTML, and then make the pages in
their application inherit from this new class. While this allows for common elements to be reused
across a site, it adds complexity and mixes source code with markup. Furthermore, this method
can only be visually tested by running the application - not while designing it. Other developers
have used include files and other tricks to avoid having to implement the same navigation and
other elements in every page.

ASP.NET 2.0 introduced the concept of "master pages", which allow for template-based page
development. A web application can have one or more master pages, which, beginning with
ASP.NET 3.5, can be nested. Master templates have place-holder controls, called
ContentPlaceHolders to denote where the dynamic content goes, as well as HTML and
JavaScript shared across child pages.

Child pages use those ContentPlaceHolder controls, which must be mapped to the place-holder
of the master page that the content page is populating. The rest of the page is defined by the
shared parts of the master page, much like a mail merge in a word processor. All markup and
server controls in the content page must be placed within the ContentPlaceHolder control.

When a request is made for a content page, ASP.NET merges the output of the content page
with the output of the master page, and sends the output to the user.

The master page remains fully accessible to the content page. This means that the content page
may still manipulate headers, change title, configure caching etc. If the master page exposes

Page | 17
public properties or methods (e.g. for setting copyright notices) the content page can use these
as well.

Directory structure

In general, the ASP.NET directory structure can be determined by the developer's preferences.
Apart from a few reserved directory names, the site can span any number of directories. The
structure is typically reflected directly in the urls. Although ASP.NET provides means for
intercepting the request at any point during processing, the developer is not forced to funnel
requests through a central application or front controller.

The special directory names (from ASP.NET 2.0 on) are

App_Browsers
holds site-specific browser definition files.
App_Code
This is the "raw code" directory. The ASP.NET server automatically compiles files (and
subdirectories) in this folder into an assembly which is accessible in the code of every
page of the site. App_Code will typically be used for data access abstraction code, model
code and business code. Also any site-specific http handlers and modules and web
service implementation go in this directory. As an alternative to using App_Code the
developer may opt to provide a separate assembly with precompiled code.
App_Data
default directory for databases, such as Access mdb files and SQL Server mdf files. This
directory is usually the only one with write access for the application.
App_LocalResources
Contains localized resource files for individual pages of the site. E.g. a file called
CheckOut.aspx.fr-FR.resx holds localized resources for the French version of the
CheckOut.aspx page. When the UI culture is set to french, ASP.NET will automatically find
and use this file for localization.
App_GlobalResources
Holds resx files with localized resources available to every page of the site. This is where
the ASP.NET developer will typically store localized messages etc. which are used on
more than one page.
App_Themes

Page | 18
holds alternative themes of the site.
App_WebReferences
holds discovery files and WSDL files for references to web services to be consumed in the
site.
Bin
Contains compiled code (.dll files) for controls, components, or other code that you want to
reference in your application. Any classes represented by code in the Bin folder are
automatically referenced in your application.

Performance

ASP.NET aims for performance benefits over other script-based technologies (including Classic
ASP) by compiling the server-side code to one or more DLL files on the web server. This
compilation happens automatically the first time a page is requested (which means the developer
need not perform a separate compilation step for pages). This feature provides the ease of
development offered by scripting languages with the performance benefits of a compiled binary.
However, the compilation might cause a noticeable but short delay to the web user when the
newly-edited page is first requested from the web server, but won't again unless the page
requested is updated further.

The ASPX and other resource files are placed in a virtual host on an Internet Information
Services server (or other compatible ASP.NET servers; see Other Implementations, below). The
first time a client requests a page, the .NET framework parses and compiles the file(s) into a
.NET assembly and sends the response; subsequent requests are served from the DLL files. By
default ASP.NET will compile the entire site in batches of 1000 files upon first request. If the
compilation delay is causing problems, the batch size or the compilation strategy may be
tweaked.

Developers can also choose to pre-compile their "codebehind" files before deployment, using MS
Visual Studio, eliminating the need for just-in-time compilation in a production environment. This
also eliminates the need of having the source code on the web server.

Page | 19
Page | 20
Microsoft Visual Studio

It is an Integrated Development Environment (IDE) from Microsoft. It can be used to


develop console and graphical user interface applications along with Windows Forms
applications, web sites, web applications, and web services in both native code together with
managed code for all platforms supported by Microsoft Windows, Windows Mobile, Windows CE,
.NET Framework, .NET Compact Framework and Microsoft Silverlight.

Visual Studio includes a code editor supporting IntelliSense as well as code refactoring.
The integrated debugger works both as a source-level debugger and a machine-level debugger.
Other built-in tools include a forms designer for building GUI applications, web designer, class
designer, and database schema designer. It accepts plug-ins that enhance the functionality at
almost every levelincluding adding support for source-control systems (like Subversion and
Visual SourceSafe) and adding new toolsets like editors and visual designers for domain-specific
languages or toolsets for other aspects of the software development lifecycle (like the Team
Foundation Server client: Team Explorer).

Visual Studio supports different programming languages by means of language


services, which allow the code editor and debugger to support (to varying degrees) nearly any
programming language, provided a language-specific service exists. Built-in languages include
C/C++ (via Visual C++), VB.NET (via Visual Basic .NET), C# (via Visual C#), and F# (as of Visual
Studio 2010[3]). Support for other languages such as M, Python, and Ruby among others is
available via language services installed separately. It also supports XML/XSLT, HTML/XHTML,
JavaScript and CSS. Individual language-specific versions of Visual Studio also exist which
provide more limited language services to the user: Microsoft Visual Basic, Visual J#, Visual C#,
and Visual C++.

Microsoft provides "Express" editions of its Visual Studio 2010 components Visual
Basic, Visual C#, Visual C++, and Visual Web Developer at no cost. Visual Studio 2010, 2008
and 2005 Professional Editions, along with language-specific versions (Visual Basic, C++, C#,
J#) of Visual Studio 2005 are available for free to students as downloads via Microsoft's
DreamSpark program.
Page | 21
Architecture

Visual Studio does not support any programming language, solution or tool intrinsically. Instead,
it allows plugging in various types of functionality. Specific functionality is coded as a VSPackage.
When installed, the functionality is available as a Service. The IDE provides three services:
SVsSolution, which provides the ability to enumerate projects and solutions; SVsUIShell, which
provides windowing and UI functionality (including tabs, toolbars and tool windows); and
SVsShell, which deals with registration of VSPackages. In addition, the IDE is also responsible
for coordinating and enabling communication between services. All editors, designers, project
types and other tools are implemented as VSPackages. Visual Studio uses COM to access the
VSPackages. The Visual Studio SDK also includes the Managed Package Framework (MPF),
which is a set of managed wrappers around the COM-interfaces that allow the Packages to be
written in any CLI compliant language. However, MPF does not provide all the functionality
exposed by the Visual Studio COM interfaces. The services can then be consumed for creation
of other packages, which add functionality to the Visual Studio IDE.

Support for programming languages is added by using a specific VSPackage called a Language
Service. A language service defines various interfaces which the VSPackage implementation can
implement to add support for various functionalities. Functionalities that can be added this way
include syntax coloring, statement completion, brace matching, parameter information tooltips,
member lists and error markers for background compilation. If the interface is implemented, the
functionality will be available for the language. Language services are to be implemented on a
per-language basis. The implementations can reuse code from the parser or the compiler for the
language. Language services can be implemented either in native code or managed code. For
native code, either the native COM interfaces or the Babel Framework (part of Visual Studio
SDK) can be used. For managed code, the MPF includes wrappers for writing managed
language services.

Visual Studio does not include any source control support built in but it defines the MSSCCI
(Microsoft Source Code Control Interface) by implementing which source control systems can
integrate with the IDE. MSSCCI defines a set of functions that are used to implement various
source control functionality. MSSCCI was first used to integrate Visual SourceSafe with Visual
Studio 6.0 but was later opened up via the Visual Studio SDK. Visual Studio .NET 2002 used
MSSCCI 1.1, and Visual Studio .NET 2003 used MSSCCI 1.2. Both Visual Studio 2005 and 2008

Page | 22
use MSSCCI Version 1.3, which adds support for rename and delete propagation as well as
asynchronous opening.

Visual Studio supports running multiple instances of the environment (each with its own set of
VSPackages). The instances use different registry hives (see MSDN's definition of the term
"registry hive" in the sense used here) to store their configuration state and are differentiated by
their AppId (Application ID). The instances are launched by an AppId-specific .exe that selects
the AppId, sets the root hive and launches the IDE. VSPackages registered for one AppId are
integrated with other VSPackages for that AppId. The various product editions of Visual Studio
are created using the different AppIds. The Visual Studio Express edition products are installed
with their own AppIds, but the Standard, Professional and Team Suite products share the same
AppId. Consequently, one can install the Express editions side-by-side with other editions, unlike
the other editions which update the same installation. The professional edition includes a
superset of the VSPackages in the standard edition and the team suite includes a superset of the
VSPackages in both other editions. The AppId system is leveraged by the Visual Studio Shell in
Visual Studio 2008

Features

Code editor
The Visual Studio code editor showing IntelliSense suggestions and a docked Task List window

Visual Studio, like any other IDE, includes a code editor that supports syntax highlighting and
code completion using IntelliSense for not only variables, functions and methods but also
language constructs like loops and queries. IntelliSense is supported for the included languages,
as well as for XML and for Cascading Style Sheets and JavaScript when developing web sites
and web applications. Auto complete suggestions are popped up in a modeless list box, overlaid
on top of the code editor. In Visual Studio 2008 onwards, it can be made temporarily semi-
transparent to see the code obstructed by it. The code editor is used for all supported languages.

The Visual Studio code editor also supports setting bookmarks in code for quick navigation.
Other navigational aids include collapsing code blocks and incremental search, in addition to
normal text search and reg ex search. The code editor also includes a multi-item clipboard and a
task list. The code editor supports code snippets, which are saved templates for repetitive code
and can be inserted into code and customized for the project being worked on. A management

Page | 23
tool for code snippets is built in as well. These tools are surfaced as floating windows which can
be set to automatically hide when unused or docked to the side of the screen. The Visual Studio
code editor also supports code refactoring including parameter reordering, variable and method
renaming, interface extraction and encapsulation of class members inside properties, among
others.

Visual Studio features background compilation (also called incremental compilation). As code is
being written, Visual Studio compiles it in the background in order to provide feedback about
syntax and compilation errors, which are flagged with a red wavy underline. Warnings are
marked with a green underline. Background compilation does not generate executable code,
since it requires a different compiler than the one used to generate executable code. Background
compilation was initially introduced with Microsoft Visual Basic but has now been expanded for all
included languages.

Debugger

Visual Studio includes a debugger that works both as a source-level debugger and as a machine-
level debugger. It works with both managed code as well as native code and can be used for
debugging applications written in any language supported by Visual Studio. In addition, it can
also attach to running processes and monitor and debug those processes. If source code for the
running process is available, it displays the code as it is being run. If source code is not available,
it can show the disassembly. The Visual Studio debugger can also create memory dumps as well
as load them later for debugging. Multi-threaded programs are also supported. The debugger can
be configured to be launched when an application running outside the Visual Studio environment
crashes.The debugger allows setting breakpoints (which allow execution to be stopped
temporarily at a certain position) and watches (which monitor the values of variables as the
execution progresses). Breakpoints can be conditional, meaning they get triggered when the
condition is met. Code can be stepped over, i.e., run one line (of source code) at a time. It can
either step into functions to debug inside it, or step over it, i.e., the execution of the function body
isn't available for manual inspection. The debugger supports Edit and Continue, i.e., it allows
code to be edited as it is being debugged (32 bit only; not supported in 64 bit). When debugging,
if the mouse pointer hovers over any variable, its current value is displayed in a tooltip ("data
tooltips"), where it can also be modified if desired. During coding, the Visual Studio debugger lets
certain functions be invoked manually from the Immediate tool window. The parameters to the
method are supplied at the Immediate window.
Page | 24
Designer

Visual Studio includes a host of visual designers to aid in the development of applications. These
tools include:

Visual Studio 2005 in Designer view The WPF Designer in Visual Studio

Visual Studio Web Designer in code editor Visual Studio 2005 in Class Designer

Windows Forms Designer


The Windows Forms designer is used to build GUI applications using Windows Forms. It
includes a palette of UI widgets and controls (including buttons, progress bars, labels,
layout containers and other controls) that can be dragged and dropped on a form surface.
Layout can be controlled by housing the controls inside other containers or locking them to
the side of the form. Controls that display data (like textbox, list box, grid view, etc.) can be
data-bound to data sources like databases or queries. The UI is linked with code using an
event-driven programming model. The designer generates either C# or VB.NET code for
the application.

Page | 25
WPF Designer
The WPF designer, codenamed Cider, was introduced with Visual Studio 2008. Like the
Windows Forms designer it supports the drag and drop metaphor. It is used to author user
interfaces targeting Windows Presentation Foundation. It supports all WPF functionality
including databinding and automatic layout management. It generates XAML code for the
UI. The generated XAML file is compatible with Microsoft Expression Design, the
designer-oriented product. The XAML code is linked with code using a code-behind model.
Web designer/development
Visual Studio also includes a web-site editor and designer that allows web pages to be
authored by dragging and dropping widgets. It is used for developing ASP.NET
applications and supports HTML, CSS and JavaScript. It uses a code-behind model to link
with ASP.NET code. From Visual Studio 2008 onwards, the layout engine used by the web
designer is shared with Microsoft Expression Web. There is also ASP.NET MVC support
for MVC technology as a separate download and Dynamic Data project available from
Microsoft.
Class designer
The Class Designer is used to author and edit the classes (including its members and their
access) using UML modeling. The Class Designer can generate C# and VB.NET code
outlines for the classes and methods. It can also generate class diagrams from hand-
written classes.
Data designer
The data designer can be used to graphically edit database schemas, including typed
tables, primary and foreign keys and constraints. It can also be used to design queries
from the graphical view.
Mapping designer
From Visual Studio 2008 onwards, the mapping designer is used by LINQ to SQL to
design the mapping between database schemas and the classes that encapsulate the
data. The new solution from ORM approach, ADO.NET Entity Framework, replaces and
improves the old technology.

Page | 26
C#

It is pronounced "see sharp", is a multi-paradigm programming language encompassing


imperative, functional, generic, object-oriented (class-based), and component-oriented
programming disciplines. It was developed by Microsoft within the .NET initiative and later
approved as a standard by Ecma (ECMA-334) and ISO (ISO/IEC 23270). C# is one of the
programming languages designed for the Common Language Infrastructure.

C# is intended to be a simple, modern, general-purpose, object-oriented programming language.


Its development team is led by Anders Hejlsberg. The most recent version is C# 4.0, which was
released in April 12, 2010.

Design goals

1. C# language is intended to be a simple, modern, general-purpose, object-oriented


programming language.
2. The language, and implementations thereof, should provide support for software
engineering principles such as strong type checking, array bounds checking, detection
of attempts to use uninitialized variables, and automatic garbage collection. Software
robustness, durability, and programmer productivity are important.
3. The language is intended for use in developing software components suitable for
deployment in distributed environments.
4. Source code portability is very important, as is programmer portability, especially for
those programmers already familiar with C and C++.
5. Support for internationalization is very important.
6. C# is intended to be suitable for writing applications for both hosted and embedded
systems, ranging from the very large that use sophisticated operating systems, down to
the very small having dedicated functions.
7. Although C# applications are intended to be economical with regard to memory and
processing power requirements, the language was not intended to compete directly on
performance and size with C or assembly language.

History

During the development of the .NET Framework, the class libraries were originally written using a
managed code compiler system called Simple Managed C (SMC). In January 1999, Anders
Page | 27
Hejlsberg formed a team to build a new language at the time called Cool, which stood for "C-like
Object Oriented Language". Microsoft had considered keeping the name "Cool" as the final name
of the language, but chose not to do so for trademark reasons. By the time the .NET project was
publicly announced at the July 2000 Professional Developers Conference, the language had
been renamed C#, and the class libraries and ASP.NET runtime had been ported to C#.

C#'s principal designer and lead architect at Microsoft is Anders Hejlsberg, who was previously
involved with the design of Turbo Pascal, Embarcadero Delphi (formerly CodeGear Delphi and
Borland Delphi), and Visual J++. In interviews and technical papers he has stated that flaws in
most major programming languages (e.g. C++, Java, Delphi, and Smalltalk) drove the
fundamentals of the Common Language Runtime (CLR), which, in turn, drove the design of the
C# programming language itself.

James Gosling, who created the Java programming language in 1994, and Bill Joy, a co-founder
of Sun Microsystems, the proprietor of Java, called C# an "imitation" of Java; Gosling further
claimed that "[C# is] sort of Java with reliability, productivity and security deleted." Klaus Kreft and
Angelika Langer (authors of a C++ streams book) stated in a blog post that "Java and C# are
almost identical programming languages. Boring repetition that lacks innovation," "Hardly
anybody will claim that Java or C# are revolutionary programming languages that changed the
way we write programs," and "C# borrowed a lot from Java - and vice versa. Now that C#
supports boxing and unboxing, we'll have a very similar feature in Java." Anders Hejlsberg has
argued that C# is "not a Java clone" and is "much closer to C++" in its design.

Page | 28
BACK END

SQL SERVER 2005

SQL Server 2005 (codenamed Yukon), released in October 2005, is the successor to SQL
Server 2000. It included native support for managing XML data, in addition to relational data. For
this purpose, it defined an xml data type that could be used either as a data type in database
columns or as literals in queries. XML columns can be associated with XSD schemas; XML data
being stored is verified against the schema. XML is converted to an internal binary data type
before being stored in the database. Specialized indexing methods were made available for XML
data. XML data is queried using XQuery ; CLR Integration was the main features with this edition
where one could write SQL code as Managed Code these are those code which are being
executed by CLR(Common Language Runtime). SQL Server 2005 added some extensions to the
T-SQL language to allow embedding XQuery queries in T-SQL. In addition, it also defines a new
extension to XQuery, called XML DML, that allows query-based modifications to XML data. SQL
Server 2005 also allows a database server to be exposed over web services using TDS packets
encapsulated within SOAP (protocol) requests. When the data is accessed over web services,
results are returned as XML.

For relational data, T-SQL has been augmented with error handling features (try/catch) and
support for recursive queries (Common Table Expressions). SQL Server 2005 has also been
enhanced with new indexing algorithms and better error recovery systems. Data pages are
checksummed for better error resiliency, and optimistic concurrency support has been added for
better performance. Permissions and access control have been made more granular and the
query processor handles concurrent execution of queries in a more efficient way. Partitions on
tables and indexes are supported natively, so scaling out a database onto a cluster is easier.
SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET Framework.

SQL Server 2005 introduced "MARS" (Multiple Active Results Sets), a method of allowing usage
of database connections for multiple purposes

Page | 29
Architecture

Protocol layer

Protocol layer implements the external interface to SQL Server. All operations that can be
invoked on SQL Server are communicated to it via a Microsoft-defined format, called Tabular
Data Stream (TDS). TDS is an application layer protocol, used to transfer data between a
database server and a client. Initially designed and developed by Sybase Inc. for their Sybase
SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server,
TDS packets can be encased in other physical transport dependent protocols, including TCP/IP,
Named pipes, and Shared memory. Consequently, access to SQL Server is available over these
protocols. In addition, the SQL Server API is also exposed over web services.

Data storage

The main unit of data storage is a database, which is a collection of tables with typed columns.
SQL Server supports different data types, including primary types such as Integer, Float,
Decimal, Char (including character strings), Varchar (variable length character strings), binary
(for unstructured blobs of data), Text (for textual data) among others. The rounding of floats to
integers uses either Symmetric Arithmetic Rounding or Symmetric Round Down (Fix) depending
on arguments: SELECT Round(2.5, 0) gives 3.

Microsoft SQL Server also allows user-defined composite types (UDTs) to be defined and used.
It also makes server statistics available as virtual tables and views (called Dynamic Management
Views or DMVs). In addition to tables, a database can also contain other objects including views,
stored procedures, indexes and constraints, along with a transaction log. A SQL Server database
can contain a maximum of 231 objects, and can span multiple OS-level files with a maximum file
size of 220 TB. The data in the database are stored in primary data files with an extension .mdf.
Secondary data files, identified with an .ndf extension, are used to store optional metadata. Log
files are identified with the .ldf extension.

Storage space allocated to a database is divided into sequentially numbered pages, each 8 KB in
size. A page is the basic unit of I/O for SQL Server operations. A page is marked with a 96-byte
header which stores metadata about the page including the page number, page type, free space

Page | 30
on the page and the ID of the object that owns it. Page type defines the data contained in the
page - data stored in the database, index, allocation map which holds information about how
pages are allocated to tables and indexes, change map which holds information about the
changes made to other pages since last backup or logging, or contain large data types such as
image or text. While page is the basic unit of an I/O operation, space is actually managed in
terms of an extent which consists of 8 pages. A database object can either span all 8 pages in an
extent ("uniform extent") or share an extent with up to 7 more objects ("mixed extent"). A row in a
database table cannot span more than one page, so is limited to 8 KB in size. However, if the
data exceeds 8 KB and the row contains Varchar or Varbinary data, the data in those columns
are moved to a new page (or possibly a sequence of pages, called an Allocation unit) and
replaced with a pointer to the data.

For physical storage of a table, its rows are divided into a series of partitions (numbered 1 to n).
The partition size is user defined; by default all rows are in a single partition. A table is split into
multiple partitions in order to spread a database over a cluster. Rows in each partition are stored
in either B-tree or heap structure. If the table has an associated index to allow fast retrieval of
rows, the rows are stored in-order according to their index values, with a B-tree providing the
index. The data is in the leaf node of the leaves, and other nodes storing the index values for the
leaf data reachable from the respective nodes. If the index is non-clustered, the rows are not
sorted according to the index keys. An indexed view has the same storage structure as an
indexed table. A table without an index is stored in an unordered heap structure. Both heaps and
B-trees can span multiple allocation units.

Buffer management

SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-
memory, and the set of all pages currently buffered is called the buffer cache. The amount of
memory available to SQL Server decides how many pages will be cached in memory. The buffer
cache is managed by the Buffer Manager. Either reading from or writing to any page copies it to
the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than
the on-disc version. The page is updated on the disc by the Buffer Manager only if the in-memory
cache has not been referenced for some time. While writing pages back to disc, asynchronous
I/O is used whereby the I/O operation is done in a background thread so that other operations do
not have to wait for the I/O operation to complete. Each page is written along with its checksum
when it is written. When reading the page back, its checksum is computed again and matched
Page | 31
with the stored version to ensure the page has not been damaged or tampered with in the
meantime.

Logging and Transaction

SQL Server ensures that any change to the data is ACID-compliant, i.e., it uses transactions to
ensure that any operation either totally completes or is undone if fails, but never leaves the
database in an intermediate state. Using transactions, a sequence of actions can be grouped
together, with the guarantee that either all actions will succeed or none will. SQL Server
implements transactions using a write-ahead log. Any changes made to any page will update the
in-memory cache of the page, simultaneously all the operations performed will be written to a log,
along with the transaction ID which the operation was a part of. Each log entry is identified by an
increasing Log Sequence Number (LSN) which ensure that no event overwrites another. SQL
Server ensures that the log will be written onto the disc before the actual page is written back.
This enables SQL Server to ensure integrity of the data, even if the system fails. If both the log
and the page were written before the failure, the entire data is on persistent storage and integrity
is ensured. If only the log was written (the page was either not written or not written completely),
then the actions can be read from the log and repeated to restore integrity. If the log wasn't
written then integrity is also maintained although the database state remains unchanged as if the
transaction never occurred. If it was only partially written, then the actions associated with the
unfinished transaction are discarded. Since the log was only partially written, the page is
guaranteed to have not been written, again ensuring data integrity. Removing the unfinished log
entries effectively undoes the transaction. SQL Server ensures consistency between the log and
the data every time an instance is restarted.

Concurrency and locking

SQL Server allows multiple clients to use the same database concurrently. As such, it needs to
control concurrent access to shared data, to ensure data integrity - when multiple clients update
the same data, or clients attempt to read data that is in the process of being changed by another
client. SQL Server provides two modes of concurrency control: pessimistic concurrency and
optimistic concurrency. When pessimistic concurrency control is being used, SQL Server controls
concurrent access by using locks. Locks can be either shared or exclusive. Exclusive lock grants
the user exclusive access to the data - no other user can access the data as long as the lock is
held. Shared locks are used when some data is being read - multiple users can read from data

Page | 32
locked with a shared lock, but not acquire an exclusive lock. The latter would have to wait for all
shared locks to be released. Locks can be applied on different levels of granularity - on entire
tables, pages, or even on a per-row basis on tables. For indexes, it can either be on the entire
index or on index leaves. The level of granularity to be used is defined on a per-database basis
by the database administrator. While a fine grained locking system allows more users to use the
table or index simultaneously, it requires more resources. So it does not automatically turn into
higher performing solution. SQL Server also includes two more lightweight mutual exclusion
solutions - latches and spinlocks - which are less robust than locks but are less resource
intensive. SQL Server uses them for DMVs and other resources that are usually not busy. SQL
Server also monitors all worker threads that acquire locks to ensure that they do not end up in
deadlocks - in case they do, SQL Server takes remedial measures, which in many cases is to kill
one of the threads entangled in a deadlock and rollback the transaction it started. To implement
locking, SQL Server contains the Lock Manager. The Lock Manager maintains an in-memory
table that manages the database objects and locks, if any, on them along with other metadata
about the lock. Access to any shared object is mediated by the lock manager, which either grants
access to the resource or blocks it.

SQL Server also provides the optimistic concurrency control mechanism, which is similar to the
multiversion concurrency control used in other databases. The mechanism allows a new version
of a row to be created whenever the row is updated, as opposed to overwriting the row, i.e., a
row is additionally identified by the ID of the transaction that created the version of the row. Both
the old as well as the new versions of the row are stored and maintained, though the old versions
are moved out of the database into a system database identified as Tempdb. When a row is in the
process of being updated, any other requests are not blocked (unlike locking) but are executed
on the older version of the row. If the other request is an update statement, it will result in two
different versions of the rows - both of them will be stored by the database, identified by their
respective transaction IDs.

Page | 33
Data retrieval

The main mode of retrieving data from an SQL Server database is querying for it. The query is
expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with
Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved. It
is processed by the query processor, which figures out the sequence of steps that will be
necessary to retrieve the requested data. The sequence of actions necessary to execute a query
is called a query plan. There might be multiple ways to process the same query. For example, for
a query that contains a join statement and a select statement, executing join on both the tables
and then executing select on the results would give the same result as selecting from each table
and then executing the join, but result in different execution plans. In such case, SQL Server
chooses the plan that is supposed to yield the results in the shortest possible time. This is called
query optimization and is performed by the query processor itself.

SQL Server includes a cost-based query optimizer which tries to optimize on the cost, in terms of
the resources it will take to execute the query. Given a query, the query optimizer looks at the
database schema, the database statistics and the system load at that time. It then decides which
sequence to access the tables referred in the query, which sequence to execute the operations
and what access method to be used to access the tables. For example, if the table has an
associated index, whether the index should be used or not - if the index is on a column which is
not unique for most of the columns (low "selectivity"), it might not be worthwhile to use the index
to access the data. Finally, it decides whether to execute the query concurrently or not. While a
concurrent execution is more costly in terms of total processor time, because the execution is
actually split to different processors might mean it will execute faster. Once a query plan is
generated for a query, it is temporarily cached. For further invocations of the same query, the
cached plan is used. Unused plans are discarded after some time.

SQL Server also allows stored procedures to be defined. Stored procedures are parameterized
T-SQL queries, that are stored in the server itself (and not issued by the client application as is
the case with general queries). Stored procedures can accept values sent by the client as input
parameters, and send back results as output parameters. They can call defined functions, and
other stored procedures, including the same stored procedure (up to a set number of times).

Page | 34
SQL CLR

Microsoft SQL Server 2005 includes a component named SQL CLR ("Common Language
Runtime") via which it integrates with .NET Framework. Unlike most other applications that
use .NET Framework, SQL Server itself hosts the .NET Framework runtime, i.e., memory,
threading and resource management requirements of .NET Framework are satisfied by SQLOS
itself, rather than the underlying Windows operating system. SQLOS provides deadlock detection
and resolution services for .NET code as well. With SQL CLR, stored procedures and triggers
can be written in any managed .NET language, including C# and VB.NET. Managed code can
also be used to define UDT's (user defined types), which can persist in the database. Managed
code is compiled to .NET assemblies and after being verified for type safety, registered at the
database. After that, they can be invoked like any other procedure. However, only a subset of the
Base Class Library is available, when running code under SQL CLR. Most APIs relating to user
interface functionality are not available.

When writing code for SQL CLR, data stored in SQL Server databases can be accessed using
the ADO.NET APIs like any other managed application that accesses SQL Server data.
However, doing that creates a new database session, different from the one in which the code is
executing. To avoid this, SQL Server provides some enhancements to the ADO.NET provider
that allows the connection to be redirected to the same session which already hosts the running
code. Such connections are called context connections and are set by setting context connection
parameter to true in the connection string. SQL Server also provides several other enhancements
to the ADO.NET API, including classes to work with tabular data or a single row of data as well
as classes to work with internal metadata about the data stored in the database. It also provides
access to the XML features in SQL Server, including XQuery support. These enhancements are
also available in T-SQL Procedures in consequence of the introduction of the new XML Datatype
(query,value,nodes functions).

Page | 35
ALGORITHM

Page | 36
1.1 Introduction

Today Web Mining is one of the popular issues in data mining. Web Mining can be broadly
defined as the discovery and analysis of useful information from the World Wide Web. [I] In terms
of its research domains, web mining can be divided into three branches: Web Content Mining,
Web Structure Mining and Web Usage Mining. Web content mining describes the discovery of us
information from the web content/data/documents. The web content encompasses a broad rang
of data such as text, image, audio, video... . Web structure mining is the process of discovering
knowledge from the World Wide Web organization and links across the webs. Web usage mining
is the process of extracting interesting patterns or knowledge from various web access log
records.

During the process of web usage mining, The rules and patterns in web log records are explored
and analyzed mainly by means of these techniques relating to artificial intelligence, data mining,
database theory and so on. In general, web usage mining consists of three phases, namely data
preprocessing, pattern discovery and pattern analysis, shown in Figure l-l. he main purpose of
data preprocessing is to extract useful data from raw web log and then to map these data into the
abstract data form necessary for pattern discovery; Pattern discovery finds out various rules and
patterns, taking advantage of all sorts of data mining techniques such as statistics analysis,
association rule analysis, classification, clustering, and sequential pattern analysis and so on;
Pattern analysis filters out the useless rules discovered in the period of pattern discovery, and
then extracts the interesting rules and patterns for the end users.

In this paper, mining web access log records discovers different access patterns of different
users. In this way, web designers and maintainers can easily improve the link structure of web
pages, enhance web server performance and provide characteristic service for customers. For
example, corporations that deal with e-commerce can use this method to optimize customer
strategy, discover target customers, and let all kinds of customers realize their excellent and
characteristic services. Consequently the mining of the web user access pattern can be applied
into many industries widely
Page | 37
Figure 1-1 Web Usage Mining

2 Problem Description : User access pattern is the concept based on user access path,
which is an important sequential pattern in web usage mining. After preprocessing web server
log data, we can get user access paths from user session files. Fig 2-1 shows the user's(Uidl)
browsing path through which Uidl accesses certain web site: A-B-C-D-B-G-E-H-G-C-A-1-K-1-D.

Figure 2-1. the User's(Uid,) Browsing Path

Page | 38
The link structure between web pages can be mapped into a directed graph: Sitemap=(N, E).
N={pl, p2, ,pi, , pn} (n=1,2,) denotes the collection of all pages of the web site, 1 in, |N||
(=n) denotes the number of pages. E={<pl, p2>, <p4, p7>, , <pi,pj>} denotes the set of all the
hyperlinks of all pages of the web site. <pi, pj> denotes that there is a hyperlink from page pi to pj,
ij, 1 i, jn.

Definition 1. xi, xi+1


N(1 i k-1), <xi, xi+1> E, the duple AP=(userid, {x1, x2, , xi, , xk-1, xk})
(k2) is called userid's access pattern.

Example 1. In Figure 2-1, (Uidl, {A, B, C, D}) is one of the Uidl's access pattern.
The difference between user access pattern definited here and user access path is that access
pattern is always the forward reference of access path, whereas user access path includes
forward and backward references of web pages. For instance, the access path of page H shown
in Figure 2-1 is A-B-C-D-B-G-E-H, whereas the corresponding access pattern is AP H=(Uidl, {A, B,
G, E, H}). So, it is concluded that access pattern presents users' browsing behavior more simply
and clearly than access path and it facilitates the analysis and mining of users' access laws as
well.

Definition 2. x N, that duple AP=(userid, {x}) is called userid's meta-access pattern.

Example 2. In Figure 2-1, (Uidl, {A}) is the Uidl's meta-access pattern.

Definition 3. If access pattern AP=(userid, {x1, x2, , xi, , xs-1, xs}) is known, we say that {xl, x2,
, xi, xk-1, xk} (k s)
is the length-k base of AP.

Example 3. {A, B}, {B, C}, and {C, D} are the length-2 bases of AP=(Uidl, {A, B, C, I)}); {A, B, C}
and {B, C, D} are the length-3 bases of AP=(Uidl, {A, B, C, D}).

Definition 4. APl=(useridl, {xl, x2, , xi, , xs-1, xs}) and AP2=(userid2, {y1, y2, , yj, , yt-1, yt}) (1
i s,
1 j t,
s t),
iff useridl= userid2, x1=yj , x2=yj+1 , , xi=yj+i-1, , xs-1=yj+s-1, we say that API
is the sub-pattern of AP2 , which is denoted as AP1 AP2.

Page | 39
Definition 5. If access pattern AP=(userid, {xl, x2, .., xi, , xk-1, xk}) (k=2, 3, ) is known,
xs and there doesnt exist (userid, {x1, x2, , xi, , xk-1, xk, xs}), so AP is called userids longest
access pattern.

Definition 6. If it is known that access pattern AP=(userid, {x1, x2, , xi, , xk-1, xk}{k=2, 3, ) we
say that {he minimal access count among the access counts of all pages contained in AP is the
access count of AP, which is denoted as [AP].

Definition 7. If it is known that access pattern AP=(userid, {x1, x2, , xi, , xk-1, xk) {k=2, 3, )
SupAP= ([AP] / [ Supuserid] ) * [Supuserid] presents the total number of the access counts of each
longest access pattern whose user-name is userid).

Definition 8.
( I ) access pattern AP=(userid {x1, x2, , xi, , xk-1, xk) {k=2, 3, ) if SupAp min_sup
(the
threshold of support) and the number of pages contained in AP is 14 we say that AP is the
frequent length-n access pattern. All the sub-patterms of frequent access pattern are also
frequent access patterns.

( II) the longest access pattern AP=(userid {x1, x2, , xi, , xk-1, xk) {k=2, 3, ) if SupAp
min_sup,
we say that AP is the longest frequent access pattern. This paper emphatically
discusses how to mine frequent access patterns that satisfy minimum support (minimum support
is the minimal threshold of support defined by experience.). Moreover, it must be indicated that
the support of access pattern defined here is different from the concept of support defined in
association rule. The support of association rule represents the relation among those items in the
same item set, whereas the support of access pattern reflects universality of certain access
pattern in the entire users' browsing behaviors. Consequently, it is necessary to distinguish
between the two similar concepts.

Page | 40
3. FAP-Mining Algorithm:

FP-growth is an algorithm with good functionality when it is used in mining association rules and
sequential patterns. There is no sequence among those elements of an item during mining
association rules, whereas access pattern mining requires sequential page access. Thus the Fp-
growth has to be revised before applied to mining user frequent access pattern. In this paper, the
new algorithm is called Frequent Access Pattern Mining (FAP-Mining). The FAP-Mining is divided
into two steps. Section 3.1 is Step One, which constructs frequent access pattern tree (FAP tree)
according to access paths derived from user session files, and records the access counts of each
page. Section 3.2 is Step Two, where the function of FAP-growth is used to mine both long and
short access patterns on the FAP tree.

3.1 The Construction of FAP-Tree

Algorithm: FAP_Tree(tree, p). Construct frequent access tree.


Input: The set of user access path p.
Output: The set of use access pattern.
Procedure FAP-Tree(T, p);
{
create-tree(T);
/Construct the root of FAP-Tree signed with null /
while p<>nil do
{
if p.name is the same as the name of Ts ancestor (n) then
{
n.count:=n.count+ 1 ;
T:=n;
}
else
if p.name is the same as the name of Ts child(c) then
{
Page | 41
c. count :=c.count+ 1 ;
T:=c;
}else
insert-tree(T, p);
/insert the new node of p into T, as a child of the current node /
p:=p.next;
}
}

In order to facilitate frequent access pattern generation and FAP tree traversal, a page header
table ascending on access counts of each page is built so that each page points to its occurrence
in the tree via ,a chain of node-link. Table 3-1 shows an episode of access path of certain user
contained in the user session file. According to access paths in Table 3-1, the function of FAP-
Tree constructs frequent access tree shown in Figure 3-1.

Figure 3-1 FAP Tree

Page | 42
3.2. FAP-growth
The mining of FAP tree proceeds as follow. At first,according to the access counts of each page-
node in FAPtree, the function of FAP-growth generates each length-1frequent access pattern (as
an initial suffix), thenconstructs its prefix pattern base, and finally builds alonger access pattern
by every prefix pattern baseconnecting with its suffix. If the longer access patternsatisfies the
min-sup, it becomes a new frequent accesspattern The FAP-growth executes the procedure until
allfrequent length-1 access patterns have been done. In short,the FAP-growth method transforms
the problem of mininglong frequent access pattern to searching for shorter ones(as initial saixes)
and then concatenating the suffixes.

Algorithm: FAP-gowth(tree, a ), mine frequent access


pattern.
Input: FAP tree, min_sup=2(the minimal acces., 4- count
that satisfies the support threshold).
Output: the set of all the access patterns: a.
Procedure FAP-growth(tree, a );
{
for each a, .count>,min-sup
/ a, is a member of the page header table. /
{

generate access pattern p = a, ;


a:=a U p;
p:=a, .next;
/p points to the first location of a, in the FAP
tree./
while (pfnull) and (p.countbmin-sup) do
{
look for each a, s prefix access pattern base,
then construct access pattern /3, by a, s
prefuc access pattern base connecting with itself;
Page | 43
if p, 3 min-sup then
a:=a up,;

/p points to the next location of ai in the FAP


1
tree./
1
1
The mining of the FAP tree shown in Figure 3-1 is
summarized in Table 3-2. And for the page-node D in the
page headler table as an example, we illustrate the mning
process of FAP-growth (tree, a ):
( 1 ) 11.count 3 min-sup, generate frequent len@h-l
access pattern p =(Userid, {D}), a :=a U p ;
( 2 ) p(=D.next) points to the first location of the
page-node: D in the FAP tree that lies in the branch of AB-
C-D;
(3) p.count (=2) 2min_sup, construct the set ofDs
prefix access patternbase({{C}:3. {B,C}::!, {A,B,C}:3}),
then build access pattern ,8, by Ds prefix access pattern
base connecting with D. (shown in Figure 3-1). if Suppp,
>,min_su]p, ,8, is a frequent access pattern.

( 4 ) p(=p.next) points to the next location of the


page-node D in the FAP tree that lies in the branch of 12-1
-D; Because p.count(=l) doesnt satisf;y the min--sup,
the function doesnt mine ths branch. Then the function
will do with the next page-node in the page head table
after the page-node D have been done.

Page | 44
4. Conclusion

The mining of frequent access pattern from usersbrowsing web behaviors is worth applying, to
many fieldssuch as the design and mintenance of web site,e-commence and education-. The
FAP-]Mining methodproposed in this paper is feasible by extracting usersaccess patterns from
users access paths of certain web site.If being improved, this method would be widely applied to
many fields. Next, we will use a large number of data to teste the functionality of this method, and
make further exploration on the analysis of association rule and access pattern among users
browsing behaviors

Page | 45
SCOPE

Page | 46
Web Usage Mining
Web usage mining is the type of Web mining activity that involves the automatic discovery of user
access patterns from one or more Web servers. As more organizations rely on the Internet and
the World Wide Web to conduct business, the traditional strategies and techniques for market
analysis need to be revisited in this context. Organizations often generate and collect large
volumes of data in their daily operations. Most of this information is usually generated
automatically by Web servers and collected in server access logs. Other sources of user
information include referrer logs which contains information about the referring pages for each
page reference, and user registration or survey data gathered via tools such as CGI scripts.

Analyzing such data can help these organizations to determine the life time value of customers,
cross marketing strategies across products, and effectiveness of promotional campaigns, among
other things. Analysis of server access logs and user registration data can also provide valuable
information on how to better structure a Web site in order to create a more effective presence for
the organization. In organizations using intranet technologies, such analysis can shed light on
more effective management of workgroup communication and organizational infrastructure.
Finally, for organizations that sell advertising on the World Wide Web, analyzing user access
patterns helps in targeting ads to specific groups of users.

Most of the existing Web analysis tools provide mechanisms for reporting user activity in the
servers and various forms of data filtering. Using such tools, for example, it is possible to
determine the number of accesses to the server and the individual files within the organization's
Web space, the times or time intervals of visits, and domain names and the URLs of users of the
Web server. However, in general, these tools are designed to deal handle low to moderate traffic
servers, and furthermore, they usually provide little or no analysis of data relationships among the
accessed files and directories within the Web space.

Page | 47
1.Pros

Web mining essentially has many advantages which makes this technology attractive to
corporations including the government agencies. This technology has enabled ecommerce to do
personalized marketing, which eventually results in higher trade volumes. The government
agencies are using this technology to classify threats and fight against terrorism. The predicting
capability of the mining application can benefits the society by identifying criminal activities. The
companies can establish better customer relationship by giving them exactly what they need.
Companies can understand the needs of the customer better and they can react to customer
needs faster. The companies can find, attract and retain customers; they can save on production
costs by utilizing the acquired insight of customer requirements. They can increase profitability by
target pricing based on the profiles created. They can even find the customer who might default
to a competitor the company will try to retain the customer by providing promotional offers to the
specific customer, thus reducing the risk of losing a customer.

2.Cons

Web mining the technology itself doesnt create issues, but this technology when used on data of
personal nature might cause concerns. The most criticized ethical issue involving web mining is
the invasion of privacy. Privacy is considered lost when information concerning an individual is
obtained, used, or disseminated, especially if this occurs without their knowledge or consent. The
obtained data will be analyzed, and clustered to form profiles; the data will be made anonymous
before clustering so that no individual can be linked directly to a profile. But usually the group
profiles are used as if they are personal profiles. Thus these applications de-individualize the
users by judging them by their mouse clicks. De-individualization, can be defined as a tendency
of judging and treating people on the basis of group characteristics instead of on their own
individual characteristics and merits. This trend has increased the amount of data being captured
and traded increasing the likeliness of ones privacy being invaded. The companies which buy
the data are obliged make it anonymous and these companies are considered authors of any
specific release of mining patterns. They are legally responsible for the contents of the release;
any inaccuracies in the release will result in serious lawsuits, but there is no law preventing them
from trading the data.
Some mining algorithms might use controversial attributes like sex, race, religion, or sexual
orientation to categorize individuals. These practices might be against the anti-discrimination
Page | 48
legislation. The applications make it hard to identify the use of such controversial attributes, and
there is no strong rule against the usage of such algorithms with such attributes. This process
could result in denial of service or a privilege to an individual based on his race, religion or sexual
orientation, right now this situation can be avoided by the high ethical standards maintained by
the data mining company. The collected data is being made anonymous so that, the obtained
data and the obtained patterns cannot be traced back to an individual. It might look as if this
poses no threat to ones privacy, actually many extra information can be inferred by the
application by combining two separate unscrupulous data from the user.

Page | 49
HARDWARE
AND
SOFTWARE
REQUIREMENTS

Page | 50
HARDWARE REQUIREMENTS

1. Processor Requirement :-
a. Minimal:- Intel Pentium IV 2000 Mhz or Similar
2. Primary Memory Requirement
a. Minimal:- 256 MB 400Mhz SD RAM
3. Secondary Memory Requirement
a. Minimal:- 15 MB

SOFTWARE REQUIREMENTS

1. Operating System

a. Preferable:- Windows XP Service Pack 2 or Higher

2. Visual Studio 2005


3. .NET Framework 2.0
4. SQL Server 2005

Page | 51
SYSTEM ANALYSIS
AND
DESIGN

Page | 52
REQUIREMENT ANALYSIS

FUNCTIONAL & NON-FUNCTIONAL REQUREMENTS

Requirements are also divided in these two groups. These requirements are: -

1. FUNCTIONAL REQUIREMENT: - Functional requirement is interaction between system


and environment. It describes relationship between the input and output of the system.
Functional requirement specify which output should be produced from the given input. For
each functional requirement, a detailed description of all the data inputs and their source,
the units of measure, and the range of valid inputs must be specified.
For our project, the functional requirements may be stated as follows:

1 People: In this project four people are involved for project development and the
project guide continually monitors as well as guides us to develop the project.

2 Hardware Requirements:

Processor Requirement :-
o Minimal:- Intel Pentium IV 2.0 Ghz or Similar
o Optimal:- Intel Core 2 Duo 2.0 Ghz or Similar
Primary Memory Requirement
o Minimal:- 256 MB 400Mhz SD RAM
o Optimal:- 1 GB 800Mhz DDR2 RAM
Secondary Memory Requirement
o Minimal:- 15 MB
o Optimal:- 25MB

Page | 53
3 Software Requirements:

Operating System

o Preferable:- Windows XP Service Pack 2 or Higher

Visual Studio 2005


.NET Framework 2.0
Sql server 2005

2. NON-FUNCTIONAL REQUIREMENT: -In non-functional requirement limits are provided


which means restriction is attached with the requirement and one has to fulfill or satisfy that
limit.

1. Time:The project has been completed within 2 and half months.

2. Input:The software registers new students data into the database and/or updates
his/her records when administrator enters a matching password with Scholar Number.

3. GENERAL TYPES OF REQUIREMENTS : -

1. Physical Environment: Physical environment requirements such as where is the


equipment to function located, are there any environmental restrictions like
temperature, humidity, magnetic interference, etc are to be gathered prior to the
development phase of the system. No special physical requirements are needed in our
project.

2. Interface: Interface requirements are such as is the input coming from one or more

Page | 54
systems, is output going to one or more systems, is there any prescribed medium that the
data must use should be gathered. In our project the input comes from user as a text. It gets
stored in the data base.

3.User & Human Factors: User and Human factors consist of requirements as who will use
the system, will there be different types of users, what is the skill levels of each type of user,
what kind of training is required for each user, and how easy will it be for a user to understand
and use the system are required. In this project no special training is required to use the
project.

4.Functional: Functional requirement such as what will the system do, when will the system
do it, are there several modes of operation, how and when the system can be changed, are
there constraints for speed, response time are required.

5.Documentation: Documentation requirements like how much documentation are required,


should it be online or book format or both are required.

6.Data: Data requirements like what should be the format of both the input and output, how
accurate must they be, should any data be retained for any period of time should be known.

Page | 55
FEASIBILITY

1) Economic Feasibility

Economic consideration is crucial for any system. these considerations includes a broad
range of cost-benefit analysis, long term corporate income & strategies .The cost/benefit
analysis is proved out to be positive for our Project.

2) Technical Feasibility:

a) Resource Availability:

Are the hardware and software resources required are available to develop the
application?

b) Technology:

Have the System designers, the skills and guidance to achieve their goal.

3) Legal Feasibility:-

A development of any infringement, violation or liability that could result from


development of the system. There is no such law or rule which will be violated by the
development of this software.

Page | 56
SOFTWARE SELECTION
CRITERIA

Page | 57
DATA MODEL

SOFTWARE ENGINEERING

1.4 SOFTWARE ENGINEERING APPROACH


Software engineering is a layered technology. The foundation for software
engineering is the process layer. Software engineering process is the glue that holds
the technology layers together and enables rational and timely development of
computer software. Process defines a framework that must be established for
effective delivery of software engineering technology. The software process forms
the basis for management control of software projects and establishes the context in
which technical method are applied, work products (models, documents, data,
reports, forms, etc.) are produced, milestones are established, quality is ensured
and change is properly managed.
Software engineering method provide the technical How tos for building software.
Methods encompass a broad array of tasks that include communication,
requirements analysis, design modeling, program construction, testing and support.
Software engineering relies on a set of basic principles that govern each area of the
technology and include modeling activities and other descriptive technique.
A process framework establishes the foundation for a complete software process by
identifying a small number of framework utilities that are applicable to all software
projects, regardless of their size or complexity. In addition, the process framework
encompasses an asset of umbrella activities that are applicable across the entire
software process. Each framework activity is populated by a set of software
engineering actions a collection of related tasks that produce a major software
engineering work product. The following generic process framework is applicable to
our software project:-

Page | 58
Communication: These framework activities involve heavy communication and
collaboration with the customer (and other stack holders) and encompass
requirements gathering and other related activities.

Planning: This activity establishes a plan for the software engineering work that
follows. It describes the technical asks to be conducted, the risks that are likely,
the resources that will be required, the work product to be produced and a work
schedule.

Modeling: This activity encompasses the creation of models that allow the
developer and customer to better understand software requirement and the
design that will achieve those requirements.

Construction: This activity combines four generation (either manual or


automated) and the testing that is required uncovering errors in the code.

Deployment: The software (as a complete entity or as a partially completed


increment) is delivered to the customer who evaluates the delivered product and
provides feedback based on the evaluation.

Page | 59
1.4.1 Process Model
Process models define a distinct set of activities, actions, tasks, milestones and
work products that are required to engineer high quality software. These process
models are not perfect, but they do provide a useful roadmap for software
engineering work. Software engineers and the managers adapt a perspective
process model to their needs and then follow it. In addition, the people who have
requested the software have a role to play as the process model is followed.
Process model provides stability, control and organization to an activity that can, if
left uncontrolled, becomes quite chaotic. The process models are also referred to as
Rigorous process models because they often encompass the capability suggested
by CMMI. The terminology and detail of each process model deer, but the generic
framework activities remain reasonably consistent.

From the point of view of the software engineer, the work products are the
programs, documents and data that are produced as a consequence of the activity
and tasks defined by the process. There are a number of software process
assessment mechanism that enable organizations to determine the Maturity of
their software process. However, the quality, timeliness and long term viability of
the product you built are the best indicators of the efficacy of the process that you
use.

Page | 60
The project follows a Simple Incremental Model, as shown below:

Fig 1.2: The SIMPLE INCREMENTAL MODEL of Software Development

It follows the following steps:

Requirement Analysis
It involves gathering information about the current existing tools and techniques
that can be used for development of Network based search engine and to decide
which one is the best one to be followed. Also to decide onto the right
programming language that can be used to implement the algorithm.

Design
It involves drawing a well established design which will involve various DFDs, i.e.,
Data Flow Diagrams. They show how the data flow takes place in the system. It
also involves various Use Case and Class Diagrams which show the interaction of
Page | 61
users with the system and behavior of various classes. Many other diagrams such
as sequence diagrams and collaboration diagrams are also drawn.

Coding
This phase involves actual coding part. It is done mostly in java but some
modules are designed using visual c# and user interface is developed using JSP
servlets. The most important part of this phase is the documentation. A well
documented code is prepared.

Testing
After the coding part is over, various test cases are prepared to test the
deliverable web application. It includes various testing such as Integration Testing
and Alpha testing,

Page | 62
System Data Flow Digram

Page | 63
Page | 64
TABLES

Page | 65
1.Product entry table : It includes information about all the items available at our site.

Product entry table

Field Name Data Type Width

Prod_no Character 5

Prod_name Character 15

Prod_qty Varchar2 4

Prod_price Varchar2 7

Prod_cat Character 10

3 Password entry table : It includes information about all the Members, its Password and Authority
Password at our site.

Field Name Data Type Width

Cust_Id Character 5

LoginPassword Varchar2
10

HintQuestion Varchar2
20

HintAnswer Varchar2 10

Page | 66
4 Web Access Table : It includes information about all the Navigational Patterns of Members
at our site.
Field Name Data Wid K
Type th ey

User_Name Varcha 50 PK
r

Session_Name Varcha 50
r

Web_Access_Sequence_P Varcha 200


ath r

Frequent_Subsequence_P Varcha 200


ath r

Web Access Table

4.BinaryTree Table : It includes information about all the Navigational Patterns of Members at
our site.

Page | 67
Field Name Data Wid K
Type th ey
Tree_Id (pk) Numbe 10 PK
r

NodeName Varcha 50
r
Left_Child Numbe 10
r

Right_Child Numbe 10
r

Node_Count Numbe 10
r

BinaryTree Table

ForExample :

User_N Session_ Web_Access_Sequenc Frequent_Subsequenc


ame Name e_Path e_Path

UserA S1 Pqspr Pqpr

UserB S2 Tptqrp Pqrp

UserB S3 Qpqupt Qpqp

UserA S4 Puqrur pqprr

Page | 68
Page | 69
OUTPUT

Page | 70
HOME PAGE

This is the home page of the website which is accessed by the users and administrator.
It displays various page links to the project.

Page | 71
BOOKS

It is the page which displays information about books.

Page | 72
ELECTRONICS

It is the page which displays information about books.

Page | 73
ADMIN

It is administrator login page which displays the login id and password tab.
Administrator can log in to this page to manage the Frequent Access Pattern.

Page | 74
CHANGE PASSWORD

It is Change Password page which displays the admin name and password tab.
Administrator can log in to this page to more admin user ids and password.

Page | 75
WAP

The web access pattern page displays the web access pattern stored as a result of the user access.
It allows administrator to generate Frequent Access Pattern.

Page | 76
TREE TABLE

It generates the TREE corresponding to the Frequent Access Pattern.


It is used to generate data for Cashed Pages.

Page | 77
WAP TREE

It generates the TREE Pattern corresponding to the Frequent Access Pattern.

Page | 78
CASHED PAGES

It displays the Cashed Pages fron Frequently accessed Page Pattern.

Page | 79
LIMITATIONS
AND
FUTURE
ENHANCEMENTS

Page | 80
LIMITATIONS OF THE PROJECT: The Project is implementedthroughly and is of
immence practical importance and use.It can be of use for the establishment for better space and
cash management.

Even though the project has some limitations:

1.The project is constrained by the availibility of the space on the website on which it is loaded.

2.The project requires the server to have .NET and C# environment.

3.The database must be in consistance with the web access pattern.

4.Users must logout of the website to ensure the data is saved in the database as project does
not support incomplete transitions.

Future Enhancements:The Project can be modified to include many features for various
other applications.

1.Project can also be linked to Research and Development Department of the Enterprise to
provide current access patterns of the users.

2.Poject can be modified to include online shopping and salesfor the company.

3.Data base administrator can be linked to the online database to check for space eficiency.

Page | 81
BIBLIOGRAPHY

Page | 82
1. Xidong Wang Yiming Ouyang Xuegang Hu Yan Zhang- Discovery of User Frequent Access
Patterns on Web Usage MinPng(School of Computer and Information, Hefei University of
technology, Tunxi Road 193, Hefei, Anhui, China.
edongl08@hotmial.com oybox3@sohu.com Xueghu@mail. hJah.cn zy-79@ina.com

2 Elmasri, navathe, somayajulu, gupta- FUNDAMENTALS OF DATABASE DESIGN


Pg. no. 5- 15

3 M. Morris Mano COMPUETR SYSTEM ORGANISATION( Pg. no. 446,462-469)

4.ASP .NET (NIIT) 9 TH EDITION(CH. NO. 5,6,7)

Page | 83

Вам также может понравиться