Вы находитесь на странице: 1из 68

Abstract:

Location recommendation plays an essential role in helping people find attractive places.
Though recent research has studied how to recommend locations with social and geographical
information, few of them addressed the cold-start problem of new users. Because mobility
records are often shared on social networks, semantic information can be leveraged to tackle this
challenge. A typical method is to feed them into explicit-feedback-based content-aware
collaborative filtering, but they require drawing negative samples for better learning
performance, as users’ negative preference is not observable in human mobility. However, prior
studies have empirically shown sampling-based methods do not perform well. To this end, we
propose a scalable Implicit-feedback-based Content-aware Collaborative Filtering (ICCF)
framework to incorporate semantic content and to steer clear of negative sampling. We then
develop an efficient optimization algorithm, scaling linearly with data size and feature size, and
quadratically with the dimension of latent space. We further establish its relationship with graph
Laplacian regularized matrix factorization. Finally, we evaluate ICCF with a large-scale LBSN
dataset in which users have profiles and textual content. We develop a novel framework, named
as l-injection, to address the sparsity problem of recommender systems. By carefully injecting
low values to a selected set of unrated user-item pairs in a user-item matrix, we demonstrate that
top-N recommendation accuracies of various collaborative filtering (CF) techniques can be
significantly and consistently improved. We first adopt the notion of pre-use preferences of users
toward a vast amount of unrated items. Using this notion, we identify uninteresting items that
have not been rated yet but are likely to receive low ratings from users, and selectively impute
them as low values. As our proposed approach is method-agnostic, it can be easily applied to a
variety of CF algorithms.
CHAPTER-I
INTRODUCTION

The web search engine has long become the most important portal for ordinary people
looking for useful information on the web. However, users might experience failure when search
engines return irrelevant results that do not meet their real intentions. Such irrelevance is largely
due to the enormous variety of users' contexts and backgrounds, as well as the ambiguity of
texts. Location Based Rating Prediction (LBRP) is a general category of search techniques
aiming at providing better search results, which are tailored for individual user needs. As the
expense, user information has to be collected and analyzed to figure out the user intention behind
the issued query.

The solutions to LBRP can generally be categorized into two types, namely click-log-
based methods and profile-based ones. The click-log based methods are straightforward— they
simply impose bias to clicked pages in the user's query history. Although this strategy has been
demonstrated to perform consistently and considerably well, it can only work on repeated queries
from the same user, which is a strong limitation confining its applicability. In contrast, profile-
based methods improve the search experience with complicated user-interest models generated
from user profiling techniques. Profile-based methods can be potentially effective for almost all
sorts of queries, but are reported to be unstable under some circumstances.

Although there are pros and cons for both types of LBRP techniques, the profile-based
LBRP has demonstrated more effectiveness in improving the quality of web searchrecently, with
increasing usage of personal and behavior information to profile its users, which is usually
gathered implicitly from query history browsing history click-through data bookmarks user
documents, and so forth. Unfortunately, such implicitly collected personal data can easily reveal
a gamut of user's private life. Privacy issues rising from the lack of protection for such data, for
instance the AOL query logs scandal, not only raise panic among individual users, but also
dampen the data-publisher's enthusiasm in offering personalized service. In fact, privacy
concerns have become the major barrier for wide proliferation of LBRP services.
MOTIVATION:

To protect user privacy in profile-based LBRP, researchers have to consider two


contradicting effects during the search process. On the one hand, they attempt to improve the
search quality with the Service location utility of the user profile. On the other hand, they need to
hide the privacy contents existing in the user profile to place the privacy risk under control and
the Service location by supplying user profile to the search engine yields better search quality.

SCOPE:

To protect user privacy in profile-based LBRP, researchers have to consider two


contradicting effects during the search process. On the one hand, they attempt to improve the
search quality with the Service location utility of the user profile. On the other hand, they need to
hide the privacy contents existing in the user profile to place the privacy risk under control. A
few previous studies, suggest that people are willing to compromise privacy if the Service
location by supplying user profile to the search engine yields better search quality. In an ideal
case, significant gain can be obtained by Service location at the expense of only a small (and
less-sensitive) portion of the user profile, namely a generalized profile. Thus, user privacy can be
protected without compromising the personalized search quality. In general, there is a tradeoff
between the search quality and the level of privacy protection achieved from generalization.

Profile Based Service location

Previous works on profile-based LBRP mainly focus on improving the search utility. The
basic idea of these works is to tailor the search results by referring to, often implicitly, a user
profile that reveals an individual information goal. In the remainder of this section, we review
the previous solutions to LBRP on two aspects, namely the representation of profiles, and the
measure of the effectiveness of Service location.

Many profile representations are available in the literature to facilitate different Service
location strategies. Earlier techniques utilize term lists/vectors or bag of words to represent their
profile. However, most recent works build profiles in hierarchical structures due to their stronger
descriptive ability, better scalability, and higher access efficiency. The majority of the
hierarchical representations are constructed with existing weighted topic hierarchy/graph, such as
and so on. Another work in builds the hierarchical profile automatically via term-frequency
analysis on the user data. In our proposed UPS framework, we do not focus on the
implementation of the user profiles. Actually, our framework can potentially adopt any
hierarchical representation based on taxonomy of knowledge.

OUR CONTRIBUTIONS:

The above problems are addressed in our UPS (literally for User customizable Privacy-
preserving Search) framework. The framework assumes that the queries do not contain any
sensitive information, and aims at protecting the privacy in individual user profiles while
retaining their usefulness for LBRP.
1. When a user issues a query q on the client, the proxy generates a user profile in
runtime in the light of query terms. The output of this step is a generalized user profile G
satisfying the privacy requirements. The generalization process is guided by considering two
conflicting metrics, namely the Service location utility and the privacy risk, both defined for user
profiles.
2. Subsequently, the query and the generalized user profile are sent together to the LBRP
server for personalized search.
3. The search results are personalized with the profile and delivered back to the query
proxy.
4. Finally, the proxy either presents the raw results to the user, or reranks them with the
complete user profile.

SOLUTIONS:

Many Service location techniques require iterative user interactions when creating
personalized search results. They usually refine the search results with some metrics which
require multiple user interactions, such as rank scoring, average rank, and so on. This paradigm
is, however, infeasible for runtime profiling, as it will not only pose too much risk of privacy
breach, but also demand prohibitive processing time for profiling. Thus, we need predictive
metrics to measure the search quality and breach risk after Service location, without incurring
iterative user interaction.
• We propose a privacy-preserving Location Based Rating Prediction framework UPS,
which can generalize profiles for each query according to user-specified privacy
requirements.
• Relying on the definition of two conflicting metrics, namely Service location utility and
privacy risk, for hierarchical user profile, we formulate the problem of privacy-preserving
personalized search as 5-Risk Profile Generalization, with its NP-hardness proved.
• We develop two simple but effective generalization algorithms, CollaborationDP and
CollaborationlL, to support runtime profiling. While the former tries to maximize the
discriminating power (DP), the latter attempts to minimize the information loss (IL).By
exploiting a number of heuristics, CollaborationIL outperforms CollaborationDP
significantly.
• We provide an inexpensive mechanism for the client to decide whether to personalize a
query in UPS. This decision can be made before each runtime profiling to enhance the
stability of the search results while avoid the unnecessary exposure of the profile.
• Our extensive experiments demonstrate the efficiency and effectiveness of our UPS
framework.
.

SCOPE OF THE PROJECT

To protect user privacy in profile-based PWS, researchers have to consider two


contradicting effects during the search process. On the one hand, they attempt to improve the
search quality with the personalization utility of the user profile. On the other hand, they need to
hide the privacy contents existing in the user profile to place the privacy risk under control. A
few previous studies, suggest that people are willing to compromise privacy if the
personalization by supplying user profile to the search engine yields better search quality. In an
ideal case, significant gain can be obtained by personalization at the expense of only a small (and
less-sensitive) portion of the user profile, namely a generalized profile. Thus, user privacy can be
protected without compromising the personalized search quality. In general, there is a tradeoff
between the search quality and the level of privacy protection achieved from generalization.

CHAPTER–II

2.1.1 PERSONALIZING SEARCH VIA AUTOMATED ANALYSIS OF


INTERESTS AND ACTIVITIES

AUTHORS: Jaime Teevan, Susan T. Dumais, Eric Horvitz

We formulate and study search algorithms that consider a user’s prior interactions with a
wide variety of content to personalize that user’s current Web search. Rather than relying on the
unrealistic assumption that people will precisely specify their intent when searching, we pursue
techniques that leverage implicit information about the user’s interests. This information is used
to re-rank Web search results within a relevance feedback framework. We explore rich models of
user interests, built from both search-related information, such as previously issued queries and
previously visited Web pages, and other information about the user such as documents and email
the user has read and created. Our research suggests that rich representations of the user and the
corpus are important for personalization, but that it is possible to approximate these
representations and provide efficient client-side algorithms for personalizing search. We show
that such personalization algorithms can significantly improve on current Web search.

2.1.2 A UTILITY-THEORETIC APPROACH TO PRIVACY IN ONLINE


SERVICES

AUTHORS: Andreas Krause, Eric Horvitz

Online offerings such as web search, news portals, and e-commerce applications face the
challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have
highlighted the potential to improve performance by introducing methods to personalize services based on
special knowledge about users and their context. For example, a user’s demographics, location, and past
search and browsing may be useful in enhancing the results offered in response to web search queries.
However, reasonable concerns about privacy by both users, providers, and government agencies acting on
behalf of citizens, may limit access by services to such information.
We introduce and explore an economics of privacy in personalization, where people can opt to
share personal information, in a standing or on-demand manner, in return for expected enhancements in
the quality of an online service. We focus on the example of web search and formulate realistic objective
functions for search efficacy and privacy.

We demonstrate how we can find a provably near-optimal optimization of the utility-privacy


tradeoff in an efficient manner.

We evaluate our methodology on data drawn from a log of the search activity of volunteer
participants. We separately access user preferences about privacy and utility via a large-scale survey,
aimed at eliciting preferences about peoples willingness to trade the sharing of personal data in returns for
gains in search efficiency. We show that a significant level of personalization can be achieved using a
relatively small amount of information about users.

2.1.3. IR EVALUATION METHODS FOR RETRIEVING HIGHLY RELEVANT


DOCUMENTS
AUTHORS:KalervoJärvelin&JaanaKekäläinen
This paper proposes evaluation methods based on the use of non-dichotomous relevance
judgment in IR experiments. It is argued that evaluation methods should credit IR methods for
Their ability to retrieve highly relevant documents. This is desirable from the user point of view
in modem large IR environments.
The proposed methods are (1) a novel application of P-R curves and average precision
computations based on separate recall bases for documents of different degrees of relevance,
and (2) two novel measures computing the cumulative gain the user obtains by examining the
retrieval result up to a given ranked position. We then demonstrate the use of these evaluation
methods in a case study on the effectiveness of query types, based on combinations of query
structures and expansion, in retrieving documents of various degrees of relevance.
The test was run with a best match retrieval system (In- Query I) in a text database
consisting of newspaper articles. The results indicate that the tested strong query structures are
most effective in retrieving highly relevant documents. The differences between the query types
are practically essential and statistically significant. More generally, the novel evaluation
methods and the case demonstrate that non-dichotomous relevance assessments are applicable in
IR experiments, may reveal interesting phenomena, and allow harder testing of IR methods.
2.1.4. OVERCOMING THE BRITTLENESS BOTTLENECK USINGWIKIPEDIA:
ENHANCING TEXT CATEGORIZATION WITH ENCYCLOPEDIC KNOWLEDGE
AUTHORS: Evgeniy Gabrilovichand ShaulMarkovitch
When humans approach the task of text categorization, they interpret the specific
wording of the document in the much larger context of their background knowledge and
experience. On the other hand, state-of-the-art information retrieval systems are quite brittle—
they traditionally represent documents as bags of words, and are restricted to learning from
individual word occurrences in the (necessarily limited) training set. For instance, given the
sentence “Wal-Mart supply chain goes real time”, how can a text categorization system know
that Wal-Mart manages its stock with RFID technology? And having read that “Ciprofloxacin
belongs to the quinol ones group”, how on earth can a machine know that the drug mentioned is
an antibiotic produced by Bayer? In this paper we present algorithms that can do just that. We
propose to enrich document representation through automatic se of a vast compendium of human
knowledge—an encyclopedia.
We apply machine learning techniques to Wikipedia, the largest encyclopedia to date,
which surpasses in scope many conventional encyclopedias and provides a cornucopia of world
knowledge. Each Wikipedia article represents a concept, and documents to be categorized are
represented in the rich feature space of words and relevant Wikipedia concepts. Empirical results
confirm that this knowledge-intensive representation brings text categorization to a qualitatively
new level of performance across a diverse collection of datasets.
2.1.5. PRIVACY-ENHANCING PERSONALIZED WEB SEARCH
AUTHORS: Yabo Xu, Benyu Zhang, Zheng Chen, Ke Wang
Personalized web search is a promising way to improve search quality by customizing
search results for people with individual information goals. However, users are uncomfortable
with exposing private preference information to search engines. On the other hand, privacy is not
absolute, and often can be compromised if there is a gain in service or profitability to the user.
Thus, a balance must be struck between search quality and privacy protection.
This paper presents a scalable way for users to automatically build rich user profiles.
These profiles summarize a user’s interests into a hierarchical organization according to specific
interests. Two parameters for specifying privacy requirements are proposed to help the user to
choose the content and degree of detail of the profile information that is exposed to the search
engine. Experiments showed that the user profile improved search quality when compared to
standard MSN rankings. More importantly, results verified our hypothesis that a significant
improvement on search quality can be achieved by only sharing some higher-level user profile
information, which is potentially less sensitive than detailed personal information.

2.1.6. A LARGESCALE EVALUATION AND ANALYSIS OF PERSONALIZED


SEARCH STRATEGIES
AUTHORS:Zhicheng Dou, Ruihua Song, JiRong Wen
Although personalized search has been proposed for many years and many
personalization strategies have been investigated, it is still unclear whether personalization is
consistently effective on different queries for different users, and under different search contexts.
In this paper, we study this problem and provide some preliminary conclusions.
We present a large-scale evaluation framework for personalized search based on query
logs, and then evaluate five personalized search strategies (including two click-based and three
profile-based ones) using 12-day MSN query logs. By analyzing the results, we reveal that
personalized search has significant improvement over common web search on some queries but
it has little effect on other queries (e.g., queries with small click entropy).
It even harms search accuracy under some situations. Furthermore, we show that
straightforward click-based personalization strategies perform consistently and considerably
well, while profile-based ones are unstable in our experiments. We also reveal that both longterm
and short-term contexts are very important in improving search performance for profile-based
personalized search strategies.
2.1.7. EMPIRICAL ANALYSIS OF PREDICTIVE ALGORITHMS FOR
COLLABORATIVE FILTERING

AUTHORS:John S. Breese, David Heckerman, and Carl Kadie

Collaborative filtering or recommender systems use a database about user preferences to


predict additional topics or products a new user might like. In this paper we describe several
algorithms designed for this task, including techniques based on correlation coefficients, vector-
based similarity calculations, and statistical Bayesian methods. We compare the predictive
accuracy of the various methods in a set of representative problem domains. We use two basic
classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions
in terms of average absolute deviation. The second estimates the utility of a ranked list of
suggested items. This metric uses an estimate of the probability that a user will see a
recommendation in an ordered list. Experiments were run for datasets associated with 3
application areas, 4 experimental protocols, and the 2 evaluation metrics for the various
algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision
trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity
methods. Between correlation and Bayesian networks, the preferred method depends on the
nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the
availability of votes with which to make predictions. Other considerations include the size of
database, speed of predictions, and learning time.
CHAPTER–III

EXISTING SYSTEM
SYSTEM DESCRIPTION

 Among existing solutions in recommender systems RS, in particular, collaborative


filtering (CF) methods have been shown to be widely effective. Based on the past
behavior of users such as explicit user ratings and implicit click logs, CF methods exploit
the similarities between users’ behavior patterns.
 Most CF methods, despite their wide adoption in practice, suffer from low accuracy if
most users rate only a few items (thus producing a very sparse rating matrix), called the
data sparsity problem. This is because the number of unrated items is significantly more
than that of rated items.
 To address this problem, some existing work attempted to infer users’ ratings on unrated
items based on additional information such as clicks and bookmarks

DRAW BACK IN EXISTING SYSTEM

 These works require an overhead of collecting extra data, which itself may have another
data sparsity problem.
 0-injection simply considers all uninteresting items as zero, it may neglect to the
characteristics of users or items. In contrast, l-injection not only maximizes the impact of
filling missing ratings but also considers the characteristics of users and items, by
imputing uninteresting items with low peruse preferences.

EXISTING ALGORITHM USED:

 CollaborationDP and CollaborationIL Algorithm.


 The Brute-Force Algorithm
PROPOSED SYSTEM

 In this work, we develop a more general l-injection to infer different user preferences for
uninteresting items for users, and show that l-injection mostly outperforms 0-injection.
 The proposed l-injection approach can improve the accuracy of top-N recommendation
based on two strategies: (1) preventing uninteresting items from being included in the
top-N recommendation, and (2) exploiting both uninteresting and rated items to predict
the relative preferences of unrated items more accurately.
 With the first strategy, because users are aware of the existence of uninteresting items but
do not like them, such uninteresting items are likely to be false positives if included in
top-N recommendation. Therefore, it is effective to exclude uninteresting items from top-
N recommendation results.
 Next, the second strategy can be interpreted using the concept of typical memory based
CF methods.

ADVANTAGES:

 We introduce a new notion of uninteresting items, and classify user preferences into pre-
use and post-use preferences to identify uninteresting items.
 We propose to identify uninteresting items via peruse preferences by solving the OCCF
problem and show its implications and effectiveness.
 We propose low-value injection (called l-injection) to improve the accuracy of top-N
recommendation in existing CF algorithms.
 While existing CF methods only employ user preferences on rated items, the proposed
approach employs both peruse and post-use preferences. Specifically, the proposed
approach first infers pre-use preferences of unrated items and identifies uninteresting
items.

PROPOSED SYSTEM ALGORITHM USED:

 Collaboration filtering
 Naive Bayes Algorithm
CHAPTER–IV

SYSTEM SPECIFICATION

HARDWARE SPECIFICATIONS

Processor - Pentium IV

Speed - 1.1 Ghz

RAM - 512 MB(min)

Hard Disk - 40 GB

Floppy Drive - 1.44 MB

Key Board - Standard Windows Keyboard

Mouse - Two or Three Button Mouse

Monitor - SVGA

SOFTWARE SPECIFICATIONS

Operating System : Windows95/98/2000/XP

Application Server : Tomcat5.0/6.X

Front End : HTML, Java, Jsp

Scripts : JavaScript.

Server side Script : Java Server Pages.

Database : MyMYSQL Server 5.0

Database Connectivity : JDBC.


CHAPTER–V

SOFTWARE SPECIFICATION

SOFTWARE REQUIREMENT SPECIFICATIONS

JAVA OVERVIEW

Java is a high-level language that can be characterized by all of the following exhortations.

 Simple

 Object Oriented

 Distributed

 Multithreaded

 Dynamic

 Architecture Neutral

 Portable

 High performance

 Robust

 Secure

In the Java programming language, all the source code is first written in plain text files
ending with the .java extension. Those source files are then compiled into .class files by the Java
compiler (javac). A class file does not contain code that is native to your processor; it instead
contains byte codes - the machine language of the Java Virtual Machine. The Java launcher tool
(java) then runs your application with an instance of the Java Virtual Machine.
JAVA PLATFORM:

A platform is the hardware or software environment in which a program runs. The most
popular platforms are Microsoft Windows, Linux, Solaris OS and MacOS. Most platforms can
be described as a combination of the operating system and underlying hardware. The java
platform differs from most other platforms in that it’s a software-only platform that runs on the
top of other hardware-based platforms.

The java platform has two components:

 The Java Virtual Machine.

 The Java Application Programming Interface(API)

Java Virtual Machine is the base for the java platform and is pored onto various
hardware-based platforms.

The API is a large collection of ready-made software components that provide many
useful capabilities, such as graphical user interface (GUI) widgets. It is grouped into libraries of
related classes and interfaces, these libraries are known as packages.

As a platform-independent environment, the Java platform can be a bit slower than native
code. However, advances in compiler and virtual machine technologies are bringing performance
close to that of native code without threatening portability.

Development Tools:

The development tools provide everything you’ll need for compiling, running,
monitoring, debugging, and documenting your applications. As a new developer, the main tools
you’ll be using are the Java compiler (javac), the Java launcher (java), and the Java
documentation (javadoc).

Application programming Interface (API):


The API provides the core functionality of the Java programming language. It offers a
wide array of useful classes ready for use in your own applications. It spans everything from
basic objects, to networking and security.

Deployment Technologies:

The JDK provides standard mechanisms such as Java Web Start and Java Plug-In, for
deploying your applications to end users.

User Interface Toolkits:

The Swing and Java 2D toolkits make it possible to create sophisticated Graphical User
Interfaces (GUIs).

Drag-and-drop support:

Drag-and-drop is one of the seemingly most difficult features to implement in user


interface development. It provides a high level of usability and intuitiveness.

Drag-and-drop is, as its name implies, a two step operation. Code must to facilitate
dragging and code to facilitate dropping. Sun provides two classes to help with this namely
DragSource and DropTarget

Look and Feel Support:

Swing defines an abstract Look and Feel class that represents all the information central to a
look-and-feel implementation, such as its name, its description, whether it’s a native look-and-
feel- and in particular, a hash table (known as the “Defaults Table”) for storing default values for
various look-and-feel attributes, such as colors and fonts.

Each look-and-feel implementation defines a subclass of Look And Feel (for example,
swing .plaf.motif.MotifLookAndFeel) to provide Swing with the necessary information to
manage the look-and-feel.
The UIManager is the API through which components and programs access look-and-feel
information (They should rarely, if ever, talk directly to a LookAndFeelinstance). UIManager is
responsible for keeping track of which LookAndFeel classes are available, which are installed,
and which is currently the default. The UIManager also manages access to the Defaults Table for
the current look-and-feel.

Dynamically Changing the Default Look-and-Feel:

When a Swing application programmatically sets the look-and-feel, the ideal place to do
so is before any Swing components are instantiated. This is because the
UIManager.setLookAndFeel() method makes a particular Look And Feel the current default by
loading and initializing that LookAndFeel instance, but it doesnot automatically cause any
existing components to change their look-and-feel.

Remember that components initialize their UI delegate at construct time, therefore, if the
current default changes after they are constructed, they will not automatically update their UIs
accordingly. It is up to the program to implement this dynamic switching by traversing the
containment hierarchy and updating the components individually.

Integrated Development Environment (IDE)


IDE Introduction
An Integrated Development Environment (IDE) or interactive development environment
is a software application that provides comprehensive facilities to computer programmers for
software development. An IDE normally consists of a source code editor, build automation
tools and a debugger. Most modern IDEs have intelligent code completion. Some IDEs
contain a compiler, interpreter, or both, such as Net Beans and Eclipse. Many modern IDEs
also have a class browser, an object browser, and a class hierarchy diagram, for use in object-
oriented software development. The IDE is designed to limit coding errors and facilitate error
correction with tools such as the “NetBeans” Find Bugs to locate and fix common Java
coding problems and Debugger to manage complex code with field watches, breakpoints and
execution monitoring.

An Integrated Development Environment (IDE) is an application that facilitates


application development. In general, an IDE is a graphical user interface (GUI)-based
workbench designed to aid a developer in building software applications with an integrated
environment combined with all the required tools at hand. Most common features, such as
debugging, version control and data structure browsing, help a developer quickly execute
actions without switching to other applications. Thus, it helps maximize productivity by
providing similar user interfaces (UI) for related components and reduces the time taken to
learn the language. An IDE supports single or multiple languages.

One aim of the IDE is to reduce the configuration necessary to piece together multiple
development utilities, instead providing the same set of capabilities as a cohesive unit.
Reducing that setup time can increase developer productivity, in cases where learning to use
the IDE is faster than manually integrating all of the individual tools. Tighter integration of
all development tasks has the potential to improve overall productivity beyond just helping
with setup tasks.

IDE Supporting Languages

Some IDEs support multiple languages, such as Eclipse, ActiveState Komodo, IntelliJ
IDEA, MyEclipse, Oracle JDeveloper, NetBeans, Codenvy and Microsoft Visual studio GNU
Emacs based on C and Emacs Lisp, and IntelliJ IDEA, Eclipse, MyEclipse or NetBeans, all
based on Java, or MonoDevelop, based on C#. Eclipse and Netbeans have plugins for C/C++,
Ada, GNAT (for example AdaGIDE), Perl, Python, Ruby, and PHP.

IDE Tools

There are many IDE tools available for source code editor, built automation tools and
debugger. Some of the tools are,

 Eclipse
 NetBeans
 Code::Blocks
 Code Lite
 Dialog Blocks
NetBeans IDE 8.0 and new features for Java 8

NetBeans IDE 8.0 is released, also providing new features for Java 8 technologies. It has code
analyzers and editors for working with Java SE 8, Java SE Embedded 8, and Java ME Embedded
8. The IDE also has new enhancements that further improve its support for Maven and Java EE
with PrimeFaces.

Most important highlights are:

The top 5 features of NetBeans IDE 8 are as follows:

1. Tools for Java 8 Technologies. Anyone interested in getting started with lambdas, method
references, streams, and profiles in Java 8 can do so immediately by downloading NetBeans IDE
8. Java hints and code analyzers help you upgrade anonymous inner classes to lambdas, right
across all your code bases, all in one go. Java hints in the Java editor let you quickly and
intuitively switch from lambdas to method references, and back again.
Moreover, Java SE Embedded support entails that you’re able to deploy, run, debug or profile
Java SE applications on an embedded device, such as Raspberry PI, directly from NetBeans IDE.
No new project type is needed for this, you can simply use the standard Java SE project type for
this purpose.

1. Tools for Java EE Developers. The code generators for which NetBeans IDE is well
known have been beefed up significantly. Where before you could create bits and pieces
of code for various popular Java EE component libraries, you can now generate complete
PrimeFaces applications, from scratch, including CRUD functionality and database
connections.

Additionally, the key specifications of the Java EE 7 Platform now have new and enhanced tools,
such as for working with JPA and CDI, as well as Facelets.

Let’s not forget to mention in this regard that Tomcat 8.0 and TomEE are now supported, too,
with a new plugin for WildFly in the NetBeans Plugin Manager.

3. Tools for Maven. A key strength of NetBeans IDE, and a reason why many developers have
started using it over the past years, is its out of the box support for Maven. No need to install a
Maven plugin, since it’s a standard part of the IDE. No need to deal with IDE-specific files, since
the POM provides the project structure.And now, in NetBeans IDE 8.0, there are enhancements
to the graph layouting, enabling you to visualize your POM in various ways, while also being
able to graphically exclude dependencies from the POM file, without touching the XML.
4. Tools for JavaScript. Thanks to powerful new JavaScript libraries and frameworks over the
years, JavaScript as a whole has become a lot more attractive for many developers. For some
releases already, NetBeans IDE has been available as a pure frontend environment, that is, minus
all the Java tools for which it is best known. This lightweight IDE, including Git versioning
tools, provides a great environment for frontend devs. In particular, for users of AngularJS,
Knockout, and Backbone, the IDE comes with deep editor tools, such as code completion and
cross-artifact navigation.In NetBeans IDE 8.0, there’s a very specific focus on AngularJS, since
this is such a dominant JavaScript solution at the moment. From these controllers, you can
navigate, via hyperlinks embedded in the JavaScript editor, to the related HTML views. And, as
shown in this screenshot, you can use code completion inside the HTML editor to access
controllers, and even the properties within the controllers, to help you accurately code the related
artifacts in your AngularJS applications.

Also, remember that there’s no need to download the AngularJS Seed template, since it’s built
into the NetBeans New Project wizard.

5. Tools for HTML5. JavaScript is a central component of the HTML5 Platform, a collective
term for a range of tools and technologies used in frontend development. Popular supporting
technologies are Grunt, a build tool, and Karma, a test runner framework. Both of these are now
supported out of the box in NetBeans IDE 8.0

JDBC (JAVA DATABASE CONNECTIVITY)

In an effort to set an independent database standard API for Java, Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic MYSQL database
access mechanism that provides a consistent interface to a variety of RDBMSs. This consistent
interface is achieved through the use of “plug-in” database connectivity modules, or drivers. If a
database vendor wishes to have JDBC support, he or she must provide the driver for each
platform that the database and Java run on.

To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you
discovered earlier in this chapter, ODBC has widespread support on a variety of platforms.
Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster than
developing a completely new connectivity solution.

JDBC was announced in March of 1996. It was released for a 90 day public review that
ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released soon
after.

The remainder of this section will cover enough information about JDBC for you to know
what it is about and how to use it effectively. This is by no means a complete overview of JDBC.
That would fill an entire book.

JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because of
its many goals, drove the development of the API. These goals, in conjunction with early
reviewer feedback, have finalized the JDBC class library into a solid framework for building
database applications in Java.

The goals that were set for JDBC are important. They will give you some insight as to why
certain classes and functionalities behave the way they do. The eight design goals for JDBC are
as follows:

1. MYSQL Level API:


The designers felt that their main goal was to define a MYSQL interface for Java.
Although not the lowest database interface level possible, it is at a low enough level for higher-
level tools and APIs to be created. Conversely, it is at a high enough level for application
programmers to use it confidently. Attaining this goal allows for future tool vendors to
“generate” JDBC code and to hide many of JDBC’s complexities from the end user.

2. MYSQL Conformance:

MYSQL syntax varies as you move from database vendor to database vendor. In an effort
to support a wide variety of vendors, JDBC will allow any query statement to be passed through
it to the underlying database driver. This allows the connectivity module to handle non-standard
functionality in a manner that is suitable for its users.

3. JDBC must be implemental on top of common database interfaces

The JDBC MYSQL API must “sit” on top of other common MYSQL level APIs. This
goal allows JDBC to use existing ODBC level drivers by the use of a software interface. This
interface would translate JDBC calls to ODBC and vice versa.

4. Provide a Java interface that is consistent with the rest of the Java system

Because of Java’s acceptance in the user community thus far, the designers feel that
they should not stray from the current design of the core Java system.
MYSQL Server 2008

Microsoft MYSQL Server is a relational database management system developed by


Microsoft. As a database server, it is a software product with the primary function of storing and
retrieving data as requested by other software applications-which may run either on the same
computer or on another computer across a network (including the Internet).

MYSQL is Structured Query Language, which is a computer language for storing,


manipulating and retrieving data stored in relational database. MYSQL is the standard language
for Relation Database System. All relational database management systems like MyMYSQL,
MS Access, Oracle, Sybase, Informix, postgres and MYSQL Server use MYSQL as standard
database language. Also, they are using different dialects, such as:

 MS MYSQL Server using T-MYSQL,


 Oracle using PL/MYSQL,
 MS Access version of MYSQL is called JET MYSQL (native format) etc.

History

The history of Microsoft MYSQL Server begins with the first Microsoft MYSQL Server
product - MYSQL Server 1.0, a 16-bit server for the OS/2 operating system in 1989 - and
extends to the current day. As of December 2016 the following versions are supported by
Microsoft:

 MYSQL Server 2008


 MYSQL Server 2008 R2
 MYSQL Server 2012
 MYSQL Server 2014
 MYSQL Server 2016

The current version is Microsoft MYSQL Server 2016, released June 1, 2016. The RTM
version is 13.0.1601.5. MYSQL Server 2016 is supported on x64 processors only.
MYSQL Process

When you are executing an MYSQL command for any RDBMS, the system determines
the best way to carry out your request and MYSQL engine figures out how to interpret the task.
There are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and MYSQL Query Engine, etc. Classic query
engine handles all non-MYSQL queries but MYSQL query engine won't handle logical files.

Data storage

Data storage is a database, which is a collection of tables with typed columns. MYSQL
Server supports different data types, including primary types such as Integer, Float, Decimal,
Char (including character strings), Varchar (variable length character strings), binary (for
unstructured blobs of data), Text (for textual data) among others. The rounding of floats to
integers uses either Symmetric Arithmetic Rounding or Symmetric Round Down (fix) depending
on arguments: SELECT Round(2.5, 0) gives 3.

Microsoft MYSQL Server also allows user-defined composite types (UDTs) to be


defined and used. It also makes server statistics available as virtual tables and views (called
Dynamic Management Views or DMVs). In addition to tables, a database can also contain other
objects including views, stored procedures, indexes and constraints, along with a transaction log.
A MYSQL Server database can contain a maximum of 231 objects, and can span multiple OS-
level files with a maximum file size of 260 bytes (1 exabyte). The data in the database are stored
in primary data files with an extension .mdf. Secondary data files, identified with a .ndf
extension, are used to allow the data of a single database to be spread across more than one file,
and optionally across more than one file system. Log files are identified with the .ldf extension

Storage space allocated to a database is divided into sequentially numbered pages, each 8
KB in size. A page is the basic unit of I/O for MYSQL Server operations. A page is marked with
a 96-byte header which stores metadata about the page including the page number, page type,
free space on the page and the ID of the object that owns it. Page type defines the data contained
in the page: data stored in the database, index, allocation map which holds information about
how pages are allocated to tables and indexes, change map which holds information about the
changes made to other pages since last backup or logging, or contain large data types such as
image or text.

Buffer management

MYSQL Server buffers pages in RAM to minimize disk I/O. Any 8 KB page can be
buffered in-memory, and the set of all pages currently buffered is called the buffer cache. The
amount of memory available to MYSQL Server decides how many pages will be cached in
memory. The buffer cache is managed by the Buffer Manager. Either reading from or writing to
any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-
memory copy, rather than the on-disc version. The page is updated on the disc by the Buffer
Manager only if the in-memory cache has not been referenced for some time. While writing
pages back to disc, asynchronous I/O is used whereby the I/O operation is done in a background
thread so that other operations do not have to wait for the I/O operation to complete. Each page
is written along with its checksum when it is written.

Concurrency and locking

MYSQL Server allows multiple clients to use the same database concurrently. As such, it
needs to control concurrent access to shared data, to ensure data integrity-when multiple clients
update the same data, or clients attempt to read data that is in the process of being changed by
another client. MYSQL Server provides two modes of concurrency control: pessimistic
concurrency and optimistic concurrency. When pessimistic concurrency control is being used,
MYSQL Server controls concurrent access by using locks. Locks can be either shared or
exclusive. Exclusive lock grants the user exclusive access to the data-no other user can access the
data as long as the lock is held. Shared locks are used when some data is being read-multiple
users can read from data locked with a shared lock, but not acquire an exclusive lock. The latter
would have to wait for all shared locks to be released.

MYSQLCMD

MYSQLCMD is a command line application that comes with Microsoft MYSQL Server,
and exposes the management features of MYSQL Server. It allows MYSQL queries to be written
and executed from the command prompt. It can also act as a scripting language to create and run
a set of MYSQL statements as a script. Such scripts are stored as a .MYSQL file, and are used
either for management of databases or to create the database schema during the deployment of a
database.

MYSQLCMD was introduced with MYSQL Server 2005 and this continues with
MYSQL Server 2012 and 2014. Its predecessor for earlier versions was OMYSQL and
IMYSQL, which is functionally equivalent as it pertains to TMYSQL execution, and many of the
command line parameters are identical, although MYSQLCMD adds extra versatility.

FEATURES OF MY MYSQL SERVER

The OLAP Services feature available in MYSQL Server version 7.0 is now called MY
MYSQL Server Analysis Services. The term OLAP Services has been replaced with the term
Analysis Services. Analysis Services also includes a new data mining component. The
Repository component available in MYSQL Server version 7.0 is now called Microsoft MY
MYSQL Server Meta Data Services. References to the component now use the term Meta Data
Services. The term repository is used only in reference to the repository engine within Meta Data
Services.

MYSQL-SERVER database consist of five type of objects,


They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

1) TABLE:

A database is a collection of data about a specific topic.

We can View a table in two ways,

a) Design View

b) Datasheet View

A)Design View

To build or modify the structure of a table, we work in the table design view. We can specify
what kind of dates will be holed.

B) Datasheet View

To add, edit or analyses the data itself, we work in table’s datasheet view mode.

2) QUERY:

A query is a question that has to be asked to get the required data. Access gathers data
that answers the question from one or more table. The data that make up the answer is either
dynast (if you edit it) or a snapshot (it cannot be edited).Each time we run a query, we get latest
information in the dynast. Access either displays the dynast or snapshot for us to view or perform
an action on it, such as deleting or updating.

3) FORMS:
A form is used to view and edit information in the database record. A form displays only the
information we want to see in the way we want to see it. Forms use the familiar controls such as
textboxes and checkboxes. This makes viewing and entering data easy. We can work with forms
in several views. Primarily there are two views,They are,

a) Design View

b) Form View

To build or modify the structure of a form, we work in form’s design view. We can add control
to the form that are bound to fields in a table or query, includes textboxes, option buttons, graphs
and pictures.

4) REPORT:

A report is used to view and print the information from the database. The report can
ground records into many levels and compute totals and average by checking values from many
records at once. Also the report is attractive and distinctive because we have control over the size
and appearance of it.

5) MACRO:

A macro is a set of actions. Each action in a macro does something, such as opening a form or
printing a report .We write macros to automate the common tasks that work easily and save the
time.

FEATURES OF MYSQL PROCEDURES

MYSQL procedures are characterized by many features. MYSQL procedures:

 Can contain MYSQL Procedural Language statements and features which support the

implementation of control-flow logic around traditional static and dynamic MYSQL

statements.

 Are supported in the entire DB2 family brand of database products in which many if not

all of the features supported in DB2 Version 9 are supported.


 Are easy to implement, because they use a simple high-level, strongly typed language.

 MYSQL procedures are more reliable than equivalent external procedures.

 Adhere to the MYSQL99 ANSI/ISO/IEC MYSQL standard.

 Support input, output, and input-output parameter passing modes.

 Support a simple, but powerful condition and error-handling model.

 Allow you to return multiple result sets to the caller or to a client application.

 Allow you to easily access the MYSQL STATE and MYSQLCODE values as special

variables.

 Reside in the database and are automatically backed up and restored.

 Can be invoked wherever the CALL statement is supported.

 Support nested procedure calls to other MYSQL procedures or procedures implemented

in other languages.

 Support recursion.
ALGORITHM IMPLEMENTATION
THEORETICAL MODEL
Each web server will keep the user's access information to it. Usually, this information is
called WEB Log including web server access log, proxy server log records, Browser log records,
Users’ brief introduction, users’ registration information and users’ dialogue or transaction
information and so on. The target of web data mining is to find the user's access pattern from
vast amounts of web log data and to dig out available users’ information finally. In order to
obtain users’ pattern information and have real-time update for this information, system takes
two steps to complete it: establishing users interests model and mining users interests.

Establishing User Interests Model As we all know the real intent of this system is to
achieve personalized information retrieval. So a data model must be created to do it. In this
paper, users interest model is expressed by an ordered triad which is interested word, word
weight, word fresh degree. Each interested node is marked with a triad (pi, wi, xi) abbreviated
Node (pi)

In above expression, the value range of pi is P, marked with pięP, and P is words sets,
marked with P= {p1ǃ p2 ǃ…ǃpm} ˈin which p1 ǃp2 ǃ…ǃpm are the interested words and m is the
number of words. The wi is the weight of interested word pi; the xi is the fresh degree of word
{pi}.
For the sake of the fact that different location of word in the document reflects different
importance, the location word appears is taken into account, which is called location weight
marked with sign . When calculating fresh degree of words, we use a fresh degree function f (n)
to document d w ijtf ,n˄dnęD, Sign n refers to the nth document in buffers. Sign D is the
document collection in buffers˅. The function f (n) is monotonous and non-decreasing which can
assure that the more recent a document is visited, the more users are interested in it. So the
weight and fresh degree of Node (pi) are calculated as follows.
Mining Association Rule
Through correlation analysis, such as algorithm Aprior, relationships hidden among data
are uncovered. Here are some examples. When mining association rules on web site server logs,
we find that 70% users have accessed the football pages and 15% users have accessed the diving
pages among users who have accessed sports news pages. Then such a conclusion can be drawn
about: If a user likes sports, we can prediction that the probability he likes football is 0.70 and
the probability he likes diving is 0.15. So if his query words contain the word sport,system will
push the football pages to him but filter the diving pages. Then system will rectify the user’s
interest parameter: the interest degree for sport is 0.7 and the interest degree for diving is 0.15.

Classification Analysis
In the web log mining, the input set of classification analysis is group of record collection
and several types of tags. First, each record is given a type tag. Then system checks these tags
and describes the common features of these tags. For an example, 50% users live in large cities
and their ages are between 18 and 28 among users who have submitted mp4 Orders. After
getting this information, we can provide pertinent and personalized service to the aged between
18 and 28 users living in large cities.

Clustering Analysis
Clustering analysis is different from classification analysis. It is the process of classifying
data items or users with similar characteristics. For an example that some users often browse the
pages about “TOFEI” or “GRF” or “application” or “visa”, then these users will be clustered as a
group: they may be a group of expecting overseas users. Therefore system will send e-mail about
going abroad to them and provide personalized service to them.
Sequential Pattern
Sequential pattern refers to find data items which are sequential in time from the time-
series data sets. In the web log mining, sequential pattern recognition means to find the user’s
requests for pages which are successive in time among user session. For an example that if 60%
users ordering baby sleeping bag on line order baby clothes within 2 months, then system will
predict the web pages that may be requested by the users and provide the users ordering baby
sleeping bags web pages about baby clothes actively.
WEB INTERFACE
WEB INTERFACE SUMMARY

A filter is an object than perform filtering tasks on either the


request to a resource (a servlet or static content), or on the
Filter
response from a resource, or both.
Filters perform filtering in the do Filter method.

A Filter Chain is an object provided by the servlet container


Filter Chain to the developer giving a view into the invocation chain of a
filtered request for a resource.

A filter configuration object used by a servlet container used


Filter Config
to pass information to a filter during initialization.

Defines an object that receives requests from the client and


Request Dispatcher sends them to any resource (such as a servlet, HTML file, or
JSP file) on the server.

Servlet Defines methods that all servlets must implement.

A servlet configuration object used by a servlet container


Servlet Config
used to pass information to a servlet during initialization.

Defines a set of methods that a servlet uses to communicate


Servlet Context with its servlet container, for example, to get the MIME type
of a file, dispatch requests, or write to a log file.

Implementations of this interface receive notifications of


Servlet Context Attribute
changes to the attribute list on the servlet context of a web
Listener
application.

Implementations of this interface receive notifications about


Servlet Context Listener changes to the servlet context of the web application they are
part of.

Servlet Request Defines an object to provide client request information to a


servlet.

Defines an object to assist a servlet in sending a response to


Servlet Response
the client.

Single Thread Model Ensures that servlets handle only one request at a time.

Table 3.1 Web Interactive


WEB CLASS SUMMARY
Generic Servlet Defines a generic, protocol-independent servlet.

Servlet Context Attribute This is the event class for notifications about changes to the
Event attributes of the servlet context of a web application.

This is the event class for notifications about changes to the


Servlet Context Event
servlet context of a web application.

Provides an input stream for reading binary data from a client


Servlet Input Stream request, including an efficient read Line method for reading
data one line at a time.

Servlet Output Stream Provides an output stream for sending binary data to the client.

Provides a convenient implementation of the Servlet Request


Servlet Request Wrapper interface that can be sub classed by developers wishing to adapt
the request to a Servlet.

Provides a convenient implementation of the Servlet Response


Servlet Response Wrapper interface that can be subclasses by developers wishing to adapt
the response from a Servlet.

Table 3.2 Web Class Summary


WEB METHOD SUMMARY

java.lang.String getInit Parameter(java.lang.String name)


Returns a String containing the value of the named initialization
parameter, or null if the parameter does not exist.

java.util.Enumeration getInit Parameter Names()


Returns the names of the servlet's initialization parameters as
an Enumeration of String objects, or an empty Enumeration if the servlet
has no initialization parameters.

ServletContext get Servlet Context()


Returns a reference to the Servlet Context in which the caller is
executing.

java.lang.String get Servlet Name()


Returns the name of this servlet instance

Table 3.3 Web Method


TECHNICAL AND ALGORITHMIMPLEMENTATION

Implementation is the stage of the project when the theoretical design is turned out into a
working system. Thus it can be considered to be the most critical stage in achieving a successful
new system and in giving the user, confidence that the new system will work and be effective.

The implementation stage involves careful planning, investigation of the existing system
and it’s constraints on implementation, designing of methods to achieve changeover and
evaluation of changeover methods.

3.5.1 NAIVE BAYES ALGORITHM

Given that the concepts and clickthrough data are collected from past search activities,
user’s preference can be learned. These search preferences, inform of a set of feature vectors, are
to be submitted along with future queries to the PWS server for search result re-ranking. Instead
of transmitting all the detailed personal preference information to the server, PWS allows the
users to control the amount of personal information exposed. In this section, we first review a
preference mining algorithms, namely SpyNBMethod, that we adopt in PWS, and then discuss
how PWS preserves user privacy. SpyNB learns user behavior models from preferences
extracted from click through data. Assuming that users only click on documents that are of
interest to them, SpyNB treats the clicked documents as positive samples, and predict reliable
negative documents from the unlabeled (i.e. un clicked) documents. To do the prediction, the
“spy” technique incorporates a novel voting procedure into Naive Bayes classifier to predict a
negative set of documents from the unlabeled document set. The details of the SpyNB method
can be found in. Let P be the positive set, U the unlabeled set and PN the predicted negative set
(PN ⊂ U) obtained from the SpyNB method. SpyNB assumes that the user would always prefer
the positive set over the predicted negative set.

Abstractly, the probability model for a classifier is a conditional model


over a dependent class variable with a small number of outcomes or classes, conditional
on several feature variables through . The problem is that if the number of features
is large or if a feature can take on a large number of values, then basing such a model on
probability tables is infeasible. We therefore reformulate the model to make it more
tractable.

Using Bayes'theorem, this can be written

In plain English, using Bayesian Probability terminology, the above equation can be
written as

In practice, there is interest only in the numerator of that fraction, because the
denominator does not depend on and the values of the features are given, so
that the denominator is effectively constant. The numerator is equivalent to the joint
probability model

which can be rewritten as follows, using the chain rule for repeated applications
of the definition of conditional probability:

Now the "naive" conditional independence assumptions come into play:


assume that each feature is conditionally independent of every other

feature for , given the category . This means that

,
,

and so on, for . Thus, the joint model can be


expressed as

This means that under the above independence assumptions,


the conditional distribution over the class variable is:

where the evidence is a scaling


factor dependent only on , that is, a
constant if the values of the feature variables are known.
FLOW CHART NAIVE BAYES

Fig 3.4: Naïve Bayes

COLLABORATION FILTERING ALGORITHM


Collaboration algorithms mostly (but not always) fail to find the globally optimal
solution, because they usually do not operate exhaustively on all the data. They can make
commitments to certain choices too early which prevent them from finding the best overall
solution later. For example, all known Collaboration coloring algorithms for the graph coloring
problem and all other NP-complete problems do not consistently find optimum solutions.
Nevertheless, they are useful because they are quick to think up and often give good
approximations to the optimum.

If a Collaboration algorithm can be proven to yield the global optimum for a given
problem class, it typically becomes the method of choice because it is faster than other
optimization methods like dynamic programming. Examples of such Collaboration algorithms
are Kruskal's algorithm and Prim's algorithm for finding minimum spanning trees, and the
algorithm for finding optimum Huffman trees. The theory of mastoids, and the more general
theory of greediest, provide whole classes of such algorithms.

Collaboration algorithms appear in network routing as well. Using Collaboration routing,


a message is forwarded to the neighboring node which is "closest" to the destination. The notion
of a node's location (and hence "closeness")

Fig 3.5: Collaboration Path selection


FLOW CHART OF COLLABORATION ALGORITHM
SYSTEM DESIGN

SYSTEM ARCHITECTURE
DATA FLOW DIAGRAM:

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing carried
out on this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used
by the process, an external entity that interacts with the system and the information flows
in the system.
3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.
User Login

Yes No
Check

Personalized Search Normal Search Engine


Engine

Enter Query

Apply SpyNB

View Reranked Result/


EndProcess
Click Data

Fig 3.5: Data Flow Diagram


USE CASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors in
the system can be depicted.

User

User

Login

Click Data Spy NB RSVM Ranking

Fig 3.7: Use case Diagram


SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order. It is a construct of
a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.

Super User

user Personal Search Enter Query Click Data SpyNB RSVM

Enter UserName And Password

Fig 3.8: Sequence Diagram


ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. In the Unified Modeling Language,
activity diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.

START

Click Data

Spy Generation Voting RSVM

End Process

Fig 3.9: Activity Diagram


TABLE STRUCTURE

Content table

Location table

Positive content table

Positive location table


Profile table

Search table

View
CHAPTER–VI

SYSTEM IMPEMENTATION

IMPLEMENTATION

Implementation is the stage of the project when the theoretical design is turned out into a
working system. Thus it can be considered to be the most critical stage in achieving a successful
new system and in giving the user, confidence that the new system will work and be effective.

The implementation stage involves careful planning, investigation of the existing system
and it’s constraints on implementation, designing of methods to achieve changeover and
evaluation of changeover methods.

MODULE DESCRIPTION:

1. User Interesting Profiling


2. Diversity and Concept Entropy
3. User Preferences Extraction and Privacy Preservation
4. Personalized Ranking Functions

1. User Interest Profiling

SCCFLR uses “concepts” to model the interests and preferences of a user. Since location
information is important in mobile search, the concepts are further classified into two different
types, namely, content concepts and location concepts. The concepts are modeled as ontologies,
in order to capture the relationships between the concepts. We observe that the characteristics of
the content concepts and location concepts are different. Thus, we propose two different
techniques for building the content ontology and location ontology. The ontologies indicate a
possible concept space arising from a user’s queries, which are maintained along with the click
through data for future preference adaptation. In SCCFLR, we adopt ontologies to model the
concept space because they not only can represent concepts but also capture the relationships
between concepts. Due to the different characteristics of the content concepts and location
concepts.

Diversity and Concept Entropy


SCCFLR consists of a content facet and a location facet. In order to seamlessly integrate
the preferences in these two facets into one coherent personalization framework, an important
issue we have to address is how to weigh the content preference and location preference in the
integration step. To address this issue, we propose to adjust the weights of content preference
and location preference based on their effectiveness in the personalization process. For a given
query issued by a particular user, if the personalization based on preferences from the content
facet is more effective than based on the
preferences from the location facets, more weight should be put on the content-based
preferences; and vice versa.

User Preferences Extraction and Privacy Preservation

Given that the concepts and click through data are collected from past search activities,
user’s preference can be learned. These search preferences, inform of a set of feature vectors, are
to be submitted along with future queries to the SCCFLR server for search result re-ranking.
Instead of transmitting all the detailed personal preference information to the server, SCCFLR
allows the users to control the amount of personal information exposed. In this section, we first
review a preference mining
algorithms, namely SpyNB Method, that we adopt in SCCFLR, and then discuss how SCCFLR
preserves user privacy. SpyNB learns user behavior models from preferences extracted from
clickthrough data. Assuming that users only click on documents that are of interest to them,
SpyNB treats the clicked documents as positive samples, and predict reliable negative documents
from the unlabeled (i.e. unclicked) documents. To do the prediction, the “spy” technique
incorporates a novel voting procedure into Na¨ıve Bayes classifier to predict a negative set of
documents from the unlabeled document set. The details of the SpyNB method can be found in.
Let P be the positive set, U the unlabeled set and PN the predicted negative set (PN ⊂ U)
obtained from the SpyNB method. SpyNB assumes that the user would always prefer the
positive set over the predicted negative set.

Personalized Ranking Functions


Upon reception of the user’s preferences, Ranking SVM (RSVM) is employed to learn a
personalized ranking function for rank adaptation of the search results according to the user
content and location preferences. For a given query, a set of content concepts and a set of
location concepts are extracted from the search results as the document features. Since each
document can be represented by a feature vector, it can be treated as a point in the feature space.
Using the preference pairs as the input, RSVM aims at finding a linear ranking function, which
holds for as many document preference pairs as possible. An adaptive implementation, SVM
light available at, is used in our experiments. In the following, we discuss two issues in the
RSVM training process: 1) how to extract the feature vectors for a document; 2) how to combine
the content and location weight vectors into one integrated weight vector.
CHAPTER

SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product. It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of test. Each test type addresses a specific testing requirement.

TYPES OF TESTING
UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program input produces valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as
two distinct phases.

Test Strategy and approach:

Field testing will be performed manually and functional tests will be written in detail.

Test objectives:

 All field entries must work properly.


 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.
 Features to be tested

INTEGRATION TESTING

Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components is
correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

Software integration testing is the incremental integration testing of two or more


integrated software components on a single platform to produce failures caused by interface
defects.

The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.

FUNCTIONAL TESTING

Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.


Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases. In addition, systematic coverage pertaining to identify

Business process flows; data fields, predefined processes, and successive processes must
be considered for testing. Before functional testing is complete, additional tests are identified and
the effective value of current tests is determined.

SYSTEM TESTING
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphsizing pre-driven process links and integration points.

WHITE BOX TESTING


White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It is
used to test areas that cannot be reached from a black box level.

BLACK BOX TESTING


Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software
under test is treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.

ACCEPTANCE TESTING
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.
OTHER TESTING METHODOLOGIES

User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever required. The
system developed provides a friendly user interface that can easily be understood even by a
person who is new to the system.

Output Testing

After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the specified
format. Asking the users about the format required by them tests the outputs generated or
displayed by the system under consideration. Hence the output format is considered in 2 ways –
one is on screen and another in printed format.

Validation Checking

Validation checks are performed on the following fields.

Text Field:

The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry
always flashes and error message.

Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has to
perform. Each module is subjected to test run along with sample data. The individually tested
modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces
and output revealing the errors in the system.

SAMPLE CODING

Admin.jsp

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>PMSE</title>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

<link href="style.css" rel="stylesheet" type="text/css" />

<script type="text/javascript" src="js/jquery-1.3.2.min.js"></script>

<script type="text/javascript" src="js/script.js"></script>

<script type="text/javascript" src="js/cufon-yui.js"></script>

<script type="text/javascript" src="js/arial.js"></script>

<script type="text/javascript" src="js/cuf_run.js"></script>

</head>

<body>

<div class="main">

<div class="main_resize">

<div class="header">
<div class="logo">

<h1><span>Sustaining Confidentiality Protection in Personalized Web Search For


Ontology</span><small></small></h1>

</div>

<div class="search">

<!--/searchform -->

<div class="clr"></div>

</div>

<div class="clr"></div>

<div class="menu_nav">

<ul>

<li class="active"><a href="admin.jsp">Content Data</a></li>

<li><a href="admin1.jsp">Location Data</a></li>

<li><a href="home.html">SignOut</a></li>

</ul>

<div class="clr"></div>

</div>

<div class="hbg"><imgsrc="images/header_images.PNG" width="923" height="291" alt=""


/></div>

</div>

<div class="content">

<div class="content_bg">
<div class="mainbar">

<div class="article">

<h2><span>Upload Content</span> Data</h2>

<div class="clr"></div>

<form enctype="multipart/form-data" action="insertcontent.jsp" method="post"


id="sendemail">

<ol>

<li>

<label for="name">Content Name (required)</label>

<input id="name" name="contentname" class="text" />

</li>

<li>

<label for="name">Title (required)</label>

<input id="name" name="title" class="text" />

</li>

<li>

<label for="email">Description (required)</label>

<textarea name="description" cols="" rows="4"></textarea>

</li>

<li>
<label for="website">Picture</label>

<input type="file" name="image" />

</li>

<li>

<input type="image" name="imageField" id="imageField" src="images/submit.gif"


class="send" />

<div class="clr"></div>

</li>

</ol>

</form>

</div>

</div>

<div class="sidebar">

<div class="gadget">

<h2 class="star"><span></span></h2>

<div class="clr"></div>

<ul class="sb_menu">

<li class="active"><a href="#"><imgsrc="images/upload.png" width="245" height="320"


/></a></li>

</ul>

</div>

</div>
<div class="clr"></div>

</div>

</div>

</div>

<div class="fbg">

<div class="fbg_resize">

<div class="clr"></div>

</div>

</div>

</div>

<div class="footer">

<div class="footer_resize">

<p align="center">Sustaining Confidentiality Protection in Personalized Web Search For


Ontology</p>

<div class="clr"></div>

</div>

</div>

</body>

</head>

</html>
Rating.jsp
<%@ page contentType="text/html; charset=iso-8859-1" language="java" import="java.sql.*"
errorPage="" %>
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<title>Turrion development</title>
<link rel="stylesheet" type="text/css" href="style.css" />
<!--[if IE 6]>
<script type="text/javascript" src="unitpngfix.js"></script>
<![endif]-->
<script language="javascript" type="text/javascript" src="datetimepicker.js">
</script>
</head>
<body>
<div class="wrap">
<div class="header">
<div class="logo">
<p><font s size="+2" color="#FFFF33"><b>Privacy Preserving Neural Network Learning of
Data </b></font></p>
<p><font size="+2" color="#FFFF33"><b>Analysis using Homomorphism
Algorithm</b></font>
</p>
<p>&nbsp;</p>
<p>&nbsp; </p>
</div>
</div>
<div id="menu">
<ul>
<li class="selected"><a href="index.html"><b>Home</b></a></li>
<li><a href="index.html"><b>Back</b></a></li>
</ul>
</div>
<div class="center_content">
<form action="amazon_enter1.jsp" method="get">
<%
String email =(String)session.getAttribute("eemail");
System.out.println(email);
%>
<table width="897" height="344"><p>&nbsp;</p>
<tr><td width="347"><imgsrc="images/a.jpg" width="329" height="338"></td>
<td width="538"><table width="392" height="243">
<tr><td width="384" height="78" colspan="2" align="center">
<font size="3" color="#CC0099"><b>E-health details</b></font></td></tr>
<tr><td height="38" align="center"><font size="2" color="#006600"><b><a
href="amazon_enter.jsp">upload health details to cloud</a></b></font></td>
</tr>
<tr><td height="46" align="center"><font size="2" color="#006600"><b><a
href="view.jsp">view the health details via mobile</a></b></font></td>
</tr>
<tr><td height="46" align="center"><font size="2" color="#006600"><b><a href="fraud.jsp">
View unauthorized access details
</a></b></font></td>
</tr>
<tr><td height="46" align="center"><font size="2" color="#006600"><b><a href="vvv.jsp">
view details of graph
</a></b></font></td>

</tr>
<tr><td height="46" align="center"><font size="2" color="#006600"><b><a
href="fraud_view.jsp">
View full unauthorized entry details
</a></b></font></td>
</tr>
</table></td>
</tr>
</table>
</form>
</div>
</div>
</body>
</html>
</body>

</html>
SCREENS SHOTS
CHAPTER–VII

CONCLUSION

It presented a client-side privacy protection framework called UPS for personalized web
search. UPS could potentially be adopted by any PWS that captures user profiles in a hierarchical
taxonomy. The framework allowed users to specify customized privacy requirements via the
hierarchical profiles. In addition, UPS also performed online generalization on user profiles to
protect the personal privacy without compromising the search quality. We proposed two
Collaboration algorithms, namely CollaborationDP and CollaborationIL, for the online
generalization. Our experimental results revealed that UPS could achieve quality search results
while preserving user’s customized privacy requirements. The results also confirmed the
effectiveness and efficiency of our solution.
FUTURE WORK

For future work, it will try to resist adversaries with broader background knowledge, such
as richer relationship among topics (e.g., exclusiveness, sequentially, and so on), or capability to
capture a series of queries (relaxing the second constraint of the adversary) from the victim. We
will also seek more sophisticated method to build the user profile, and better metrics to predict
the performance (especially the utility) of UPS.

Вам также может понравиться