Вы находитесь на странице: 1из 49

Ranked File Search

CCSIT Calicut university

ABSTRACT
With the advent of cloud computing, data owners are motivated to outsource their complex data
management systems from local sites to the commercial public cloud for great flexibility and economic
savings. In this paper, for the first time, we define and solve the challenging problem of privacy-preserving
multi-keyword ranked search over encrypted data in cloud computing (MRSE). As so much advantage of
cloud computing, more and more data owners centralize their sensitive data into the cloud. With a mass
of data files stored in the cloud server, it is important to provide keyword based search service to data
user. However, in order to protect the data privacy, sensitive data is usually encrypted before outsourced
to the cloud server, which makes the search technologies on plaintext unusable. In this paper, we propose
a semantic multi-keyword ranked search scheme over the encrypted cloud data, which simultaneously
meets a set of strict privacy requirements. Firstly, we utilize the Latent Semantic Analysis to reveal
relationship between terms and documents. The latent semantic analysis takes advantage of implicit
higher-order structure in the association of terms with documents (semantic structure) and adopts a
reduced-dimension vector space to represent words and documents. Thus, the relationship between
terms is automatically captured. Secondly, our scheme employ secure k-nearest neighbor (k-NN)to
achieve secure search functionality. The proposed scheme could return not only the exact matching files,
but also the files including the terms latent semantically associated to the query keyword. Finally, the
experimental result demonstrates that our method is better than the original MRSE scheme.

Ranked File Search

CCSIT Calicut university

LIST OF FIGURES
Fig 2.1: Framework of the semantic expansion based similar search over encrypted cloud data.
Fig 5.1: Encryption Process

Ranked File Search

CCSIT Calicut university

LIST OF TABLES
Table 1: login
Table 2: staff
Table 3: work
Table 4: feedback
Table 5: complaint
Table 6: file
Table 7: report
Table 8: corpus
Table 9: frequency
Table 10: salary

Ranked File Search

CCSIT Calicut university

CHAPTER 1
INTRODUCTION
As so much advantage of cloud computing, more and more data owners centralize their sensitive
data into the cloud. With a mass of data files stored in the cloud server, it is important to provide
a keyword based search service to the data user. However, in order to protect the data privacy,
sensitive data is usually encrypted before outsourced to the cloud server; which makes the search
technologies on plaintext unusable.
Due to the rapid expansion of data, the data owners tend to store their data into the cloud
to release the burden of data storage and maintenance. However, as the cloud customers and the
cloud server are not in the same trusted domain, our outsourced data may be under the exposure
to the risk. Thus, before sent to the cloud, the sensitive data needs to be encrypted to protect for
data privacy and combat unsolicited accesses. Unfortunately, the traditional plaintext search
methods cannot be directly applied to the encrypted cloud data any more. The traditional
information retrieval (IR) has already provided multi-keyword ranked search for the data user. In
the same way, the cloud server needs provide the data user with the similar function, while
protecting data and search privacy. It is meaningful storing it into the cloud server only when data
can be easily searched and utilized.
.
.

Ranked File Search

CCSIT Calicut university

CHAPTER 2
PROBLEM DEFINITION AND METHODOLOGY
2.1. Problem Formulation

System model
We consider the system model involving three different entities: data owner, data user
and cloud server, as illustrated in Figure 1.

Figure 2.1
Framework of the semantic expansion based similar search over encrypted cloud data.
Data owner uploads a collection of n text files F={F1,F2,F3,,F n } in encrypted form C,
together with the encrypted metadata set, to the cloud server. Note that, a corresponding file
metadata is constructed for each file. Each file in the collection is encrypted with common
symmetric encryption algorithm, e.g. AES.
Data user provides a search trapdoor T w for keyword w to the cloud server. In our paper, we
assume the authorization between the data owner and users is appropriately done.
Cloud server first constructs the index and SRL using the metadata set provided by data owner,
thus reduce the computing burden on owner, e.g. index creating. Upon receiving the request T w ,
the cloud server automatically expands the query keyword based on SRL. Then the server searches
the index, and returns the matching files to the user in order. Finally, the access control mechanism,

Ranked File Search

CCSIT Calicut university

which is out of the scope of this paper, is employed to manage the capability of the user to decrypt
the received files.
Threat model
In this paper, we use the same threat model described in previous searchable symmetric encryption
(SSE) scheme . We consider an honest-but-curious server in our model. Specifically, the cloud
server honestly follows the designated protocol specification, but is curious to infer and analyze
all data information available on the server so as to learn additional information. In other words,
the cloud server has no intention to actively modify the stored data or disrupt any other kind of
service. Thus we consider the threat models with attack capabilities as follows.
Known background Model: In this model, except for the encrypted dataset and metadata set the
owner upload, the server is assumed to have additional knowledge on the dataset, e.g. the subject
and its related statistical information. For instance, the server can utilize the keyword frequency
statistics to infer keywords.
Design goals
To enable effective and secure ranked semantic expansion search over outsourced cloud data under
the aforementioned model, our mechanism should achieve the following design goals.
1. Ranked semantic expansion search: To design a similar search scheme that supports semantic
search over encrypted cloud data by expanding the query keyword upon semantic relationship of
terms, which finally returns the retrieved files in order.

2. Security guarantee: To prevent cloud server from learning the plaintext of the data files and
keywords. Compared to the existing SSE schemes, the scheme should achieve the as-strong-as
possible security strength.

3. Efficiency: To achieve the above goals with minimum communication and computation overhead.

Security analysis
We estimate the security of the proposed scheme by proving the security guarantee stated above
(refer to Design goals). That is, both the data files and the keywords are not leaked to the server.

Ranked File Search

CCSIT Calicut university

Security analysis for the ranked semantic expansion Search


We analyze the solution with respect to the aforementioned search privacy requirement, e.g.
keyword privacy and file confidentiality.

File confidentiality: the file confidentiality depends on the inherently security strength of the
symmetric encryption scheme, so the file content is obviously protected well.

Keyword privacy

Ranked File Search

CCSIT Calicut university

CHAPTER 3
REQUIREMENT ANALYSIS AND SPECIFICATION
3.1 LITERATURE REVIEW
3.1.1 Ranked Keyword Search in Cloud Computing
Cloud computing is the use of computing resources (hardware and software) that are delivered as
a service over a network .It has the potential to change the IT industry. It enables cloud customers
to remotely store their data into the cloud so as to enjoy the on-demand high quality application
and services from a shared pool of configurable computing resources . Cloud Computing is the
result of evolution and adoption of all the existing technologies and paradigms. The goal of cloud
computing is to allow users to take benefit from all of these technologies, without the need for
deep knowledge about or expertise with each one of them. Clouds enable customers to remotely
store and access their data by lowering cost of hardware ownership while providing robust and fast
services . As Cloud Computing becomes prevalent, sensitive information are being increasingly
centralized into the cloud. In this, data owners may share their outsourced data with a large number
of users, who might want to only retrieve certain specific data files they are interested in during a
given session. One of the most popular ways to do so is through keyword-based search. Such
keyword search technique allows users to selectively retrieve files of interest and has been widely
applied in plaintext search scenarios . In a cloud the service providers offer their resources as
services to the general public. Public clouds offer several key benefits to service providers,
including no initial capital investment on infrastructure and shifting of risks to infrastructure
providers. However, public clouds lack fine-grained control over data, network and security
settings, which hampers their effectiveness in many business scenarios
3.1.2 Privacy-Preserving Multi-keyword Ranked Search over Encrypted Cloud Data
The advent of cloud computing, data owners are motivated to outsource their complex data
management systems from local sites to commercial public cloud for great flexibility and
economic savings. But for protecting data privacy, sensitive data has to be encrypted before
outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. Thus,
enabling an encrypted cloud data search service is of paramount importance. Considering the large
number of data users and documents in cloud, it is crucial for the search service to allow multikeyword query and provide result similarity ranking to meet the effective data retrieval need.
Related works on searchable encryption focus on single keyword search or Boolean keyword
search, and rarely differentiate the search results. In this paper, for the first time, we define and
solve the challenging problem of privacy-preserving multi-keyword ranked search over encrypted
cloud data (MRSE), and establish a set of strict privacy requirements for such a secure cloud data
utilization system to become a reality. Among various multi-keyword semantics, we choose the
efficient principle of coordinate matching, i.e., as many matches as possible, to capture the
similarity between search query and data documents, and further use inner product similarity to

Ranked File Search

CCSIT Calicut university

quantitatively formalize such principle for similarity measurement. We first propose a basic MRSE
scheme using secure inner product computation, and then significantly improve it to meet different
privacy requirements in two levels of threat models. Thorough analysis investigating privacy and
efficiency guarantees of proposed schemes is given, and experiments on the real-world dataset
further show proposed schemes indeed introduce low overhead on computation and
communication

3.2 Existing System


In the existing system, searchable encryption techniques are able to provide secure search over
encrypted data for users. They build a searchable inverted index that stores a list of mapping from
keywords to the corresponding set of files which contain this keyword. When data users input a
keyword, a trapdoor is generated for this keyword and then submitted to the cloud server. Upon
receiving the trapdoor, the cloud server executes comparison between the trapdoor and index, and
finally returns the data users all files that contain this keyword. But, these methods only allow
exact single keyword search.
The large number of data users and documents in cloud, it is crucial for the search
service to allow multi-keyword query and provide result similarity ranking to meet the effective
data retrieval need. The searchable encryption focuses on single keyword search or Boolean
keyword search, and rarely differentiates the search results.

3.2.1Limitations of the Existing System

No Latent Semantic Analysis (LSA)


Less security
Single keyword search withot ranking

3.3 Proposed System


Proposed system is a semantic multi-keyword ranked search scheme over the encrypted cloud data,
which simultaneously meets a set of strict privacy requirements. Firstly, utilize the Latent
Semantic Analysis to reveal relationship between terms and documents. The latent semantic
analysis takes advantage of implicit higher-order structure in the association of terms with
documents (semantic structure) and adopts a reduced-dimension vector space to represent words
and documents. Thus, the relationship between terms is automatically captured. Secondly, this
scheme employ secure-nearest neighbor (k-NN)to achieve secure search functionality. The
proposed scheme could return not only the exact matching files, but also the files including the
terms latent semantically associated to the query keyword. The proposed scheme could return not

Ranked File Search

CCSIT Calicut university

only the exact matching files, but also the files including the terms latent semantically associated
to the query keyword.

ADVANTAGE:
Multi-keyword ranked search over encrypted cloud data (MRSE)

3.3.1 Features of Proposed System


3.3.2 Advantages of Proposed System

Higher Accuracy.

Our proposed system enables to secure the datas in the system, by using AES

User friendly.

3.4 REQUIREMENT SPECIFICATION

Requirement Analysis is a software engineering task that bridges the gap between system level
software allocation and software design. This is done after Feasibility study. Requirement
analysis enables the system engineer to specify software functions and performance indicate
softwares interface with system elements and establish constraints that software must meet. It
provides the software designer with models that can be translated into data, architectural,
interface and procedural design. Requirements are stated in terms of user needs. Communication
for analysis must be established so that recognition of basic elements as perceived by the
user/customer is understood. Software requirement analysis may be divided into several areas of
effort
Problem Recognition
Models of the System
Specification of the Software

10

Ranked File Search

CCSIT Calicut university

3.4.1 Planning Phase


This stage includes the initiation phase. The Initiation stage establishes arm foundation to
the project, before commencement of development work. This is achieved through the creation of
the Project Initiation Document (PID), which provides a comprehensive view of the project as seen
at the outset, and the appointment of a dedicated project manager. The Project Manager has overall
control of the project, and is responsible for ensuring agreed standards for the timeliness and
quality of all project deliverables are
Met. The Project Initiation Document enables the following fundamental questions about the
project to be answered:
What is the project is aiming to achieve?
Why are these achievements important?
Who will be involved in managing the processes, and what are their
Responsibilities?
The PID also embraces three areas of information that can optionally
Be contained within sections of the PID itself or presented as free-standing
Documents. These are:
Project Plan
Project Quality Plan
Communications Plan
Activities:
Solution Options - to decide on a specific solution.
IT Estimates { to formalize the project-level estimates for costs and
Resources.
Testing to create the Project Test Strategy document.
Requirements Management to gather the complete set of Detailed Business Requirements
and obtain their formal sign-off by the business.
Security: The Project Manager is responsible for ensuring that adequate Security measures are
implemented in respect of the new or enhanced system, both in line with the value of the
information assets and potential risks involved, and that the requirements of all Information
Security policies are met.
An Information Security Officer is assigned to the project, who will own the approval of all
security activities performed throughout the project's life cycle. This is triggered by a direct email
request from Plan View life cycles to the Information Security Mailbox. Responsibility for
assigning the correct security resources remains with the Project Manager, in consultation with the
Information Security officer.

11

Ranked File Search

CCSIT Calicut university

3.4.2 Goals

A Web based Application used to automate the Purchase Order Process.


Keep the approval with the desired level of management.
Including a Fraud Detection Mechanism for checking the existing Vendor.
Avoid data duplication and provide data for auditing
A proper authentication i.e. single sign-on is needed to access the tool
That secures the privacy
A proper authentication i.e. single sign-on is needed to access the tool
That secures the privacy.

.
3.4.3 Functional Requirement
A functional requirement defines a function of a system or its component. A function is described
as a set of inputs, the behavior, and outputs.
Avoid data duplication
Automated and customized system with single sign on
Provide data for auditing Improve performance
Reduce costs
Provide management information
Data Accuracy

3.4.4 Non Functional Requirement


The non-functional requirement involves those functions that are performed by the system
independent of the user. It deals with the characteristics of the system that cannot be expressed
functionally. The non-functional requirements of the software are:
Database security and system security
Response time, Accessibility
Strong database management system
Flexible user interface
Provide smooth and efficient administration
Understandable, useful and reliable software.

12

Ranked File Search

CCSIT Calicut university

3.5 Feasibility Study


Feasibility is defined as the practical extent to which a project can be performed
successfully. The objective of feasibility study is to establish the reasons for developing the
software that is acceptable to the users, adaptable to changes and conformable to the established
standards.
Feasibility study lets the developer to foresee the future of the project and its usefulness.
It is used for:

Finding out whether a new system is required or not.

Determining the potentials and drawbacks of the existing system.


Finding out the various alternatives available.

Knowing what should be incorporated in the new system.

Defining the ingredients and objectives involved in the project.

Identifying whether the proposed system could meet the end needs of the users.

Providing technical, economic, operational feasibility of the proposed system.


Identification of user requirements and the benefits expected by the user from the
resulting system.

Various types of feasibility that are commonly considered include:


(1)
(2)
(3)
(4)
(5)
(6)

Operational Feasibility
Technical Feasibility
Economic Feasibility
Behavioural Feasibility
Software Feasibility
Hardware Feasibility

3.5.1 Operational Feasibility


Operational Feasibility assesses the extent to which the required software system
performs a series of steps to solve business problems and user requirements. This feasibility is
dependent on human resources and involves visualising whether the software will operate after
it is developed and be operative once it is installed. It also analyses whether users will adapt to
new software. The system is developed by giving prime importance to the ease with which the
end users can operate on the system. Any person or user who installs the application on the
phone is able to use it conveniently without the help of another person and becomes able to use
the services of the application with more ease.

13

Ranked File Search

CCSIT Calicut university

3.5.2 Technical Feasibility


Technical feasibility assess the current resources (includes hardware and software) and
technology which are required to accomplish the user requirements in the system within the
allocated time and budget. It is concerned with the existing computer system (hardware and
software) and to what extent it can support the proposed system. The proposed system requires
a mobile device which works on the android (any version above 2.2) operating system. A user
who owns a mobile with the appropriate technology can access the application anywhere.
3.5.3 Economic Feasibility
Economic feasibility determines whether the proposed system is capable of generating
financial gains for an organization. It involves cost incurred on the software development team,
estimated cost of hardware, and cost of performing feasibility study and so on. The proposed
system is economically feasible since the cost incurred for the development of the system
produces long term gains.
It is necessary to consider the benefits that can be achieved by developing the system
after installation of the final software product; it will reduce the operational cost of the user as
well as blood banks to a large extent. Hence the proposed system is economically feasible.
3.6 Software Requirement Specifications
3.6.1. Software Specification
One of the most difficult task is selecting software for the system, once the system requirements
is found out then we have to determine whether a particular software package fits for those system
requirements. The application requirement:

Front end
Back end
Operating system
IDE

:
:
:
:

ASP.net
SQL Server 2008
windows 7 and above
Visual Studio 2010

14

Ranked File Search

CCSIT Calicut university

3.6.2. Hardware specification


HARDWARE SPECIFICATION

The selection of hardware is very important in the existence and proper working of any
software. Then selection hardware, the size and capacity requirements are also important.

Processor
Primary Memory
Storage
Display
Key Board
Mouse

:
:
:
:
:
:

Intel Pentium Core i3 and above


256MB RAM and above
40 GB hard disk and above
VGA Colour Monitor
Windows compatible
Windows compatible

3.6.3 Developing enviornment


Ever since Microsoft announced .NET for the first time almost 10 years ago, there has been a lot
of noise in the developer community about the way the changes are going. .NET led its way to
modernize the ideas of coding with more sophisticated techniques by adopting more objectoriented paradigm in programming and also changing the style of coding altogether. The Microsoft
forerunner VB was announced to be modernized in the new environment and redesigned to be
named as VB.NET, and also some other languages that are totally different in syntax, such as C#,
J#, and C++ have been announced. All of these languages are built on top of the .NET Runtime
(known as Common Language Runtime or CLR) and produce the same intermediate output in
Microsoft Intermediate Language (MSIL).Microsoft announced .NET runtime as a separate entity
by defining standardized
rules and specifications that every language must follow to take advantage of CLR. The entirely
new set of libraries, classes, syntaxes, or even the way of coding in Microsoft technologies, created
a huge hindrance in the developer community. Many developers switched their jobs, while there
are a few who really switched gears to understand how to work with the new technology that is
totally different from its predecessors. The community has already started to realize that the
existing set of Microsoft tools might not satisfy the needs of new evolving technology. Microsoft
had to give a strong toolset to help the developers to work easier and better with the new
technology. Visual Studio is the answer to some of them. Microsoft Visual Studio is an Integrated
Development Environment (IDE) to work with Microsoft languages. It is the premier tool that
developers can posses to easily work with Microsoft technologies. But you should note, Visual
Studio is not a new product from Microsoft. It has been around for quite some time.

15

Ranked File Search

CCSIT Calicut university

ASP.NET
ASP.NET is an open source server side application framework designed for web development
to produce dynamic web pages. It was developed by Microsoft to allow programmers to build
dynamic websites, web application and web services.
ASP.NET is a unified web development model integrated with ASP.NET framework, designed
to provide services to create dynamic web application and web services. It is built on a common
language runtime of the.NET framework and includes those benefits like Multilanguage
interoperability, type safety, Garbage collection and inheritance.
MySQL
MySQL database has become the world's most popular Open source database because of its
consistency, fast performance, high reliability .It become the database of choice for an new
generation of applications built on the LAMP stack (Linux, Apache, MySQL,
PHP/Perl/Python).MySQL runs on more than 20 platforms including Linux, Windows, OS/X,
HP-UX, AIX, Netware, giving you the kind of flexibility that puts you in control. MySQL offers
a comprehensive range of certified software, support, training and consulting.
MySQL is a multithreaded, multi-user SQLD at a base Management System. My SQL's
implementation of a relational database is an abstraction on top of a computers file system. The
relational database abstraction allows collection of data items to be organized as a set of formally
described tables. Data can be accessed or reassembled from the set able in many different ways,
which do not require any reorganization of the database tables themselves. Relational database
speak SQL(Structured Query Language).SQL is a standard interactive programming language for
getting information from and updating a relational database.

16

Ranked File Search

CCSIT Calicut university

CHAPTER 4
SYSTEM DESIGN
System design is the process or art of defining the architecture. Components modules, interfaces
and data for a system to satisfy specified requirements. The most creative and challenging phase
of the system life cycles the system design. The term design describes a final system and the
process by which it is developed. In system design, we move from the logical to the physical aspect
of the life cycle
4.1 Users of the System
The system consist of two modules:
1. Admin
2. Staff

4.2 Modularity criteria


4.2.1 Admin:

Staff management
Work creation
Work allocation
Download work report
View feedback
Complaint and send reply
File search and download
Salary management
view complaints

4.2.2User

View profile
View allocated work
Upload work report
Upload file
Send complaint
View reply
File search and download
View feedback

17

Ranked File Search

CCSIT Calicut university

4.3 ARCHITECTURE DIAGRAMS


4.3.1 Data Flow Diagrams
Data Flow Diagram (DFD) is a diagrammatic representation of the data flow of the data flow in
the system. In the system development environment a system analyst will interview a client for the
entire project specification. If the diagram is as requirements and fulfills all needs of clients, then
the creation of database and table will follow.
Data flow diagrams represent one of the most ingenious tools used for structured
analysis. DFD has the purpose of clarifying system requirements and identifying major
transformations that will become progressive in system design. It is the major starting point in the
design phase that functionally decomposes the requirement specifications down to the lowest level
of detail.
The DFD, also known as bubble chart and it has purpose of clarifying system
requirements and identifying transformations, which is the primary purpose of software
development. A DFD consists of series of bubbled joined by lines. The bubbles represents data
transformations and the lines represents data flow in the system.
Rules Used In Constructing DFD
Process should be named and numbered
The direction flow is from top to bottom and left to right
After exploding lower level details of process are to be numbered
The name of the data stores, sources and destination are written in uppercase

4.3.2Physical DFD
A physical dfd shows how the system is actually implemented, either at the moment (current
physical dfd), or how the designer intends it to be in the future (Required physical dfd). Thus a
physical dfd may be used to describe the set of data items that appear on each piece of paper that
move around an office, and the fact that a particular set of pieces of paper are stored together in a
filing cabinet. It is quite possible that a physical dfd will include reference to data that are
duplicated and that the data stores, if implemented as set of database tables, would constitute an
un-normalized relational database. In contrast, a logical dfd attempts to capture the dataflow
aspects of a system in a form that has neither redundancy nor duplication

18

Ranked File Search

CCSIT Calicut university

4.3.3 Basic Symbols

An arrow identifies the dataflow in motion. It is a pipeline through which


information is flown like the rectangle in the flowchart. A circle stands for process that
converts data into information. An open-ended box represents a data store, data at rest or a
temporary repository of data. A square defines a source or destination of system data.
The merit of DFD is that it can provide an overview of what data a system would
process, what transformation of data are done, what files are used and where the result flow

19

Ranked File Search

CCSIT Calicut university

LEVEL 0

Admin
Database

Ranked
search

Staff
LEVEL-1.1-Admin
staff registration

login

staff

staff updation
and deletion

work

Admin

login

work

work allocation
for staff
download work
report

report

staff feedback

feedback

view complaint
resolve and reply

complaint

salary
management

salary

file search

file

20

Ranked File Search

CCSIT Calicut university

LEVEL-1.2

staff
profile updation

login

Staff

login

view allocated
work details

work

upload file

file

complaints and
reply

complaint

view feedback

feedback

file search
upload work
report

report

21

Ranked File Search

CCSIT Calicut university

LEVEL-2

file

login

file upload

encryption

stemming

User

corpus

frequency

login

file searching

stemming

similarity
check

view search
result

decrypt and
download

4.4 DATABASE DESIGN


A database is a collection of logically related records. The main objective of database
design is to provide effective auxiliary storage without any applications and to contribute to the
overall efficiency of the computer program components of the whole system.
The organization of data in the database aims to achieve the following objectives.

Controlled redundancy
Ease of learning in use
Data independence

22

Ranked File Search

CCSIT Calicut university

More information in low cost


Accuracy and integrity
Recovery from failures
Privacy and security
Performance

The design should be done in a way the information stored in the database can retrieved
quickly whenever necessary. The general theme behind a database is to handle information as an
interfered whole. A database is a collection of interrelated data stored with minimum redundancy
to serve users quickly and efficiently. Database design runs parallel without application design. As
we collect information about what is to be done, we will obviously collect information about data
need to entered, stored messages and printed reports. The designing of database is done outmost
care and security during the designing phase of the system. Special care was taken to develop
minimum number of database for the maximum efficiency of the system.
4.4.1 NORMALIZATION
Normalization is the process of simplifying the relationship between data element in record. It is
a transformation of complex data stores to a set of smaller data structures. Normalized data
structure are more stable simpler and easier to maintain. Normalization provides numerous benefits
to a database. Some of the major Benefits include the following:
Greater overall database organization.
Reduction of redundant data.
Data consistency within the database.
A much more flexible database designs.
A better handle on database security.
1 NF
Practical rule: - "Eliminate Repeating Groups" make a separate table for each set of related
attributes and give each table a primary key. main objectives are
Eliminate duplicate columns from the same table.
Create separate table for each group of related data and identify
Each row with a unique column or set of columns (primary key).
2 NF
Practical rule:-"Eliminate Redundant data", - if an attribute depends on only part of a multivalued
key, remove it to a separate table. The main objectives of second normal form are:
Meet all the requirements of the 1st normal form
Remove subsets of data that apply to multiple rows of a table and place them in separate
tables.
Create relationships between these new tables and their predecessors through the use of
foreign keys.

23

Ranked File Search

CCSIT Calicut university

3 NF
Practical rule:-"Eliminate Columns not Dependent on Key" if an attribute do not contribute to a
description of a key, remove them to separate table. Main objectives are
Meet all the requirements of the second normal form.
Remove columns those are not dependent upon the primary key.
TABLES
Table Name: login
NAME
login_id
user_name
password
user_type

DATATYPE
int
varchar(50)
varchar(50)
varchar(50)

CONSTRAINTS
Primary key
Not null
Not null
Not null

DATATYPE
int
varchar(50)
varchar(50)
varchar(50)
varchar(50)
varchar(50)
int
varchar(50)
bigint
varchar(50)
varchar(MAX)

CONSTRAINTS
Primary key
Not null
Not null
Not null
Not null
Not null
Not null
Not null
Not null
Not null
Not null

Table Name: staff


NAME
staff_id
staff_name
house_name
place
city
district
pin_code
email
contact_no
designation
photo

24

Ranked File Search

CCSIT Calicut university

Table Name: work


NAME
work_id
work
details
deadline
staff_id
status

DATATYPE
int
varchar(MAX)
varchar(MAX)
varchar(50)
int
varchar(50)

CONSTRAINTS
Primary key
Not null
Not null
Not null
Not null
Null

DATATYPE
int
int
varchar(MAX)
varchar(50)

CONSTRAINTS
Primary key
Foreign key
Not null
Not null

DATATYPE
int
int
varchar(MAX)
varchar(50)
varchar(MAX)

CONSTRAINTS
Primary key
Foreign key
Not null
Not null
Not null
Null

Table Name: feedback


NAME
feedback_id
staff_id
feedback
date

Table Name: complaint


NAME
complaint_id
staff_id
complaint
date
reply

Table Name: file


NAME
file_id
file_path
staff_id
date
file_name
key1
rmv_count

DATATYPE
int
varchar(MAX)
int
varchar(50)
varchar(50)
varchar(MAX)
int

CONSTRAINTS
Primary key
Not null
Foreign key
Not null
Not null
Not null
Null

25

Ranked File Search

CCSIT Calicut university

Table Name: report


NAME
report_id
path
work_id
date

DATATYPE
int
Varchar(50)
int
Varchar(50)

CONSTRAINTS
Primary key
Not null
Foreign key
Not null

DATATYPE
int
int
varchar(50)

CONSTRAINTS
Foreign key
Primary key
Not null

DATATYPE
int
int
float

CONSTRAINTS
Foreign key
Foreign key
Not null

DATATYPE
int
int
int
int
int
int
int
varchar(50)

CONSTRAINTS
Primary key
Not null
Not null
Not null
Not null
Not null
Not null
Not null

Table Name: corpus


NAME
file_id
word_id
word

Table Name: frequency


NAME
file_id
word_id
frequency

Table Name: salary


NAME
staff_id
basic_salary
pf
ba
bonus
accommodation_fee
total_salary
date

26

Ranked File Search

CCSIT Calicut university

CHAPTER 5
IMPLIMENTATION
5.1 Technologies Used in the System
5.1.1 Advanced Encryption Standard (AES)
The more popular and widely adopted symmetric encryption algorithm likely to be encountered
nowadays is the Advanced Encryption Standard (AES) [2]. It is found at least six time faster than
triple DES.A replacement for DES was needed as its key size was too small. With increasing
computing power, it was considered vulnerable against exhaustive key search attack. Triple DES
was designed to overcome this drawback but it was found slow. The Features of AES are as
Follows
Symmetric key symmetric block cipher 128-bit data,
128/192/256-bit keys
Stronger and faster than Triple-DES
Provide full specification and design details Software
implementable in C and Java
AES is an iterative rather than Feistel cipher. It is based on substitution permutation
network. It comprises of a series of linked operations, some of which involve replacing inputs by
specific outputs (substitutions) and others involve shuffling bits around (permutations).
Interestingly, AES performs all its computations on bytes rather than bits. Hence, AES treats the
128 bits of a plaintext block as 16 bytes. These 16 bytes are arranged in four columns and four
rows for processing as a matrix Unlike DES, the number of rounds in AES is variable and depends
on the length of the key. AES uses 10 rounds for 128-bit keys, 12 rounds for 192-bit keys and 14
rounds for 256-bit keys. Each of these rounds uses a different 128-bit round key, which is
calculated from the original AES key.

27

Ranked File Search

CCSIT Calicut university

Fig 5.1Encryption Process


Decryption Process
The process of decryption of an AES cipher text is similar to the encryption process in the reverse
order. Each round consists of the four processes conducted in the reverse order.

Add round key


Mix columns
Shift rows
Byte substitution

5.1.2 stemming algorithm


A stemming algorithm is a process of linguistic normalization, in which the variant forms of a
word are reduced to a common form, for example,
connection
connections
connective
---> connect
connected
connecting
It is important to appreciate that we use stemming with the intention of improving the
performance of IR systems. It is not an exercise in etymology or grammar. In fact from an
etymological or grammatical viewpoint, a stemming algorithm is liable to make many mistakes.

28

Ranked File Search

CCSIT Calicut university

In addition, stemming algorithms - at least the ones presented here - are applicable to the written,
not the spoken, form of the language.
For some of the world's languages, Chinese for example, the concept of stemming is not
applicable, but it is certainly meaningful for the many languages of the Indo-European group. In
these languages words tend to be constant at the front, and to vary at the end:
-ion
-ions
connect-ive
-ed
-ing
The variable part is the ending, or suffix. Taking these endings off is called suffix
stripping or stemming, and the residual part is called the stem.
This system provides access to three types of users, they are
Admin

Staff
Admin Module:
Admin can view the details of users that are registered. Admin can allocate work for specific staff.
After the responsible staff completes the work he can send the report of the work to the admin.
Admin can download them and verify. Admin can send speed back about staff and staff can send
complaints. The uploaded files for ranked file search can be searched by admin. He also generate
salary for staff.
Staff Module:
Staff can update his profile. He is responsible to do the work allocated by admin and send the
report for work. He can view his feedback. He can send complaints. He should upload the files for
ranked file searching. He can search them and get the ranked result.

29

Ranked File Search

CCSIT Calicut university

CHAPTER 6
TESTING & IMPLEMENTATION
6.1 Test Documentation
Software testing determines the correctness, completeness, and quality of software being
developed. Validation refers to the process of checking that the developed software meets the
requirements specified by the user. The activities involved in the testing phase basically evaluate
the capability of that system meets its requirements. The main objective of software testing is to
detect errors in the software. Errors occur if some part of the developed system is found to be
incorrect, incomplete or in consistent. Test techniques include, but are not limited to, the process
of executing a program or application with the intent of finding software bugs (errors or other
defects).It involves the execution of a software component or system to evaluate one or more
properties of interest. In general, these properties indicate the extent to which the component or
system under test:

meets the requirements that guided its design and development,


responds correctly to all kinds of inputs,
performs its functions within an acceptable time,
is sufficiently usable,
can be installed and run in its intended environments, and
Achieves the general result its stakeholders desire.

As the number of possible tests for even simple software components is practically infinite, all
software testing uses some strategy to select tests that are feasible for the available time and
resources. As a result, software testing typically (but not exclusively) attempts to execute a
program or application with the intent of finding software bugs (errors or other defects).Software
testing can provide objective, independent information about the quality of software and risk of
its failure to users and/or sponsors. Software testing can be conducted as soon as executable
software (even if partially complete) exists. The overall approach to software development often
determines when and how testing is conducted. For example, in a phased process, most testing
occurs after system requirements have been defined and then implemented in testable programs.
In contrast, under an Agile approach, requirements, programming, and testing are often done
concurrently
6.1.2 Testing Methodology
There are two approaches to testing:
1. White box testing
2. Black box testing

30

Ranked File Search

CCSIT Calicut university

Whitebox Testing
This is testing software using information about the internal structure of the software. It tests
what the program does. The test is being carried out to check the internal structure of the software.
The test is carried out successfully and the internal structure of the software meets the required
criteria.
Blackbox Testing
Tests are performed to ensure that each function is working properly. This is referred to as Black
box testing. Black-box testing is a method of software testing that examines the functionality of
an application (e.g. what the software does) without peering into its internal structures or
workings. This method of test can be applied to virtually every level of software testing: unit,
integration, system and acceptance
Unit Testing
This is the first level of testing. In this different modules are tested against the specification
produces during the design of the modules. Unit testing is done during the coding phase and to test
the internal logic of the modules. It refers to the modules. It refers to the verification of single
program module in an isolated environment. Unit testing first focuses on the modules
independently of one of another to locate errors. After coding each dialogue is tested and run
individually. All necessary coding were removed and it was ensured that all modules are worked,
as the programmer would expect. The logical errors found were corrected so, by working all the
modules independently and verifying the outputs of each module in the presence of staff, I observed that
the program was functioning as expected

In unit testing,

Module is tested to ensure that information properly flows into and out of the
program under test
Local data structures are examined to ensure that data stored temporarily maintains
its integrity during all steps in algorithm execution.
Boundary condition is tested to ensure that module operates properly at boundaries
established to limit or restrict processing.
All independent paths through the control structures are executed to ensure that all
statements in the module have been executed at least once
Error handling paths are also tested.

31

Ranked File Search

CCSIT Calicut university

Loop Testing
This testing is used to check the variety of loops present in programming. The working
of the loops such as while, for and do while are checked for its proper execution. The statements
inside the loop body are executed line by line for every condition that satisfies the loop.
Unit Testing
This testing is performed to test the individual units in the system. Each module in the system is
tested individually and executed line by line for accurate functioning of the system. The admin
module has be tested for its proper functioning, since all services are provide and data is stored
and controlled by this module.
Integration Testing
The objective of integration testing is to take all tested individual modules, integrate them, test
them again and develop the system. The admin module, the registered user module as well as the
public user module should be integrated together for the proper functioning of the whole system.
Testing is conducted at this stage to check whether the requested services are reaching to the
admin through the mobile device and when requested by the user for the nearest blood bank, its
been available to the user appropriately.
Alpha Testing
A series of acceptance tests were conducted to enable the users to validate requirements. The
suggestions, along with the additional requirements of the end user were included in the project.
Beta Testing
It is to be conducted by the end user without the presence of the developer. It can be conducted
over a period of weeks or month. Since it is a long time consuming activity, its result is out scope
of this project report. But its result will help to enhance the product at a later time.
Validation Testing
This provides final assurance that the software meets all the functional, behavioural and
performance requirement. The software is completely assembled as a package. Validation
succeeds when the software functions in a manner in which user wishes. Validation refers to the
process of using software in live environment in order to find errors. During the course of
validation the system failure may occur and sometimes the coding has to be hanged according to
the requirement. Thus the feedback from the validation phase generally produces changes in
software

32

Ranked File Search

CCSIT Calicut university

Output Testing
After performing the validation testing, the next step is output testing of the proposed system
since no system could be useful if it does not produce the required output in a specific format.
Asking the users about the format required by them, tests the output generated are considered
into two ways. One is on screen and another is printed format. The output format on the screen
found to be correct asthe format was designed in the system design phase according to the user
needs. For the hard copy also, the output comes out as the specified requirement by the user.
Hence output testing does not result in any correction in the system

33

Ranked File Search

CCSIT Calicut university

CHAPTER 7
CONCLUSION AND FUTURE WORK
7.1 Conclusion
In this paper, a multi-keyword ranked search scheme over encrypted cloud data is proposed, which
meanwhile supports latent semantic search. We use the vectors consisting of TF values as indexes
to documents. These vectors constitute a matrix, from which we analyze the latent semantic
association between terms and documents by LSA. Taking security and privacy into consideration,
we employ a secure splitting k-NN technique to encrypt the index and the queried vector, so that
we can obtain the accurate ranked results and protect the confidence of the data well. The proposed
scheme could return not only the exact matching files, but also the files including the terms latent
semantically associated to the query keyword.
As our future work, we will concentrate on the encrypted data of semantic keyword search in order
that we can confront with the more sophisticated search.

7.2 Future Work


Thus we proposed the problem of ranked search over encrypted cloud data, and construct a variety of
security requirements. From various multi key word concepts, we choose the efficient principle of
coordinate matching. We first propose secure inner data computation. Also we achieve effective ranking
result using k-nearest neighbour technique. This system is currently work on single cloud, In future is will
extended up to sky computing & Provide better security in multi-user systems. The proposed scheme could
return not only the exactly matched files, but also the files including the terms semantically related to the
query keyword. The encrypted files and metadata set are outsourced to the server by the owner.

34

Ranked File Search

CCSIT Calicut university

BIBLIOGRAPHY
[1] Rohit Khurana Software Engineering ISBN 978-81259-3946-7 second edition ,2007.
[2] D. Boneh, "Public keyencryption with keyword search",Advances in Cryptology-Eurocrypt
2004,Springer, (2004).
[3] R. Curtmola, "Searchable symmetric encryption: improved definitions and efficient
constructions",Proceedings of the 13th ACM conference on Computer and communications
security,ACM, (2006).
[4] D.X.Song,D. Wagner and A.Perrig,"Practical techniques for searches on encrypted data. in
Security and Privacy", 2000. S&P 2000,Proceedings 2000 IEEE Symposium,IEEE, (2000).
[5] C. Wang, "Secure ranked keyword search over encrypted cloud data",Distributed Computing
Systems (ICDCS), 2010 IEEE 30th International Conference,IEEE, (2010).

35

Ranked File Search

CCSIT Calicut university

APPENDIX
SCREEN LAYOUTS
Home

36

Ranked File Search

CCSIT Calicut university

Login

Admin Home

37

Ranked File Search

CCSIT Calicut university

Staff Registration

38

Ranked File Search

CCSIT Calicut university

Admin File Search

Admin Notification

39

Ranked File Search

CCSIT Calicut university

Admin Work Allocation

Admin Download Work Report

40

Ranked File Search

CCSIT Calicut university

Admin Send Feedback

41

Ranked File Search

CCSIT Calicut university

Admin Complaint

42

Ranked File Search

CCSIT Calicut university

Admin Salary Management

43

Ranked File Search

CCSIT Calicut university

Staff Home

Staff Profile

44

Ranked File Search

CCSIT Calicut university

Staff Upload File

45

Ranked File Search

CCSIT Calicut university

Staff File Search

Staff Work Details

46

Ranked File Search

CCSIT Calicut university

Staff Work Report

Staff View Feedback

47

Ranked File Search

CCSIT Calicut university

Staff Send Complaint

Staff Feedback Notification

48

Ranked File Search

CCSIT Calicut university

49