Академический Документы
Профессиональный Документы
Культура Документы
Abstract:
Top-k query is an important operation to return a set of interesting points in
a potentially huge data space. It is analyzed in this paper that the existing
algorithms cannot process top-k query on massive data efficiently. This paper
proposes a novel table-scan-based T2S algorithm to efficiently compute top-k
results on massive data. T2S first constructs the presorted table, whose tuples are
arranged in the order of the round-robin retrieval on the sorted lists. T2S maintains
only fixed number of tuples to compute results. The early termination checking for
T2S is presented in this paper, along with the analysis of scan depth. The selective
retrieval is devised to skip the tuples in the presorted table which are not top-k
results. The theoretical analysis proves that selective retrieval can reduce the
number of the retrieved tuples significantly. The construction and incrementalupdate/batch-processing methods for the used structures are proposed.
Introduction:
Top-k query is an important operation to return a set of interesting points
from a potentially huge data space. In top-k query, a ranking function F is provided
to determine the score of each tuple and k tuples with the largest scores are
returned. Due to its practical importance, top-k query has attracted extensive
attention proposes a novel table-scan-based T2S algorithm (Top-k by Table Scan)
to compute top-k results on massive data efficiently.
The analysis of scan depth in T2S is developed also. The result size k is
usually small and the vast majority of the tuples retrieved in PT are not top-k
results, this paper devises selective retrieval to skip the tuples in PT which are not
query results. The theoretical analysis proves that selective retrieval can reduce the
number of the retrieved tuples significantly.
Existing System:
To its practical importance, top-k query has attracted extensive attention.
The existing top-k algorithms can be classified into three types: indexbased
methods view-based methods and sorted-list-based methods . Index-based methods
(or view-based methods) make use of the pre-constructed indexes or views to
process top-k query.
Disadvantages:
Computational Overhead.
Data redundancy is more.
Time consuming process.
Problem Definition:
Ranking is a central part of many information retrieval problems, such
as document retrieval, collaborative filtering, sentiment analysis, computational
advertising (online ad placement).
Training data consists of queries and documents matching them together with
relevance
degree
of
each
match.
It
may
be
prepared
manually
by
human assessors (or raters, as Google calls them), who check results for some
queries and determine relevance of each result. It is not feasible to check relevance
of all documents, and so typically a technique called pooling is used only the top
few documents, retrieved by some existing ranking models are checked.
Typically, users expect a search query to complete in a short time (such as a few
hundred milliseconds for web search), which makes it impossible to evaluate a
complex ranking model on each document in the corpus, and so a two-phase
scheme is used.
Literature Survey
DISADVANTAGES:
1. The Computation Overhead is greatly affected by the size of dictionary and
the number of documents, and almost has no relation to the number of query
keywords
This paper analyzes the execution behavior of No Random Accesses (NRA) and
determines the depths to which each sorted file is scanned in growing phase and
shrinking phase of NRA respectively. The analysis shows that NRA needs to
maintain a large quantity of candidate tuples in growing phase on massive data.
Based on the analysis, this paper proposes a novel top-k algorithm TopK with Early Pruning (TKEP) which performs early pruning in growing phase.
General rule and mathematical analysis for early pruning are presented in this
paper. The theoretical analysis shows that early pruning can prune most of the
candidate tuples. Although TKEP is an approximate method to obtain the topk result, the probability for correctness is extremely high. Extensive experiments
show that TKEP has a significant advantage over NRA.
DISADVANTAGES:
1. It significantly limits the usability of outsourced data due to the difficulty of
searching over the encrypted data.
tuples that are not dominated by any other tuples. It is found that the existing
algorithms cannot process skyline on big data efficiently. This paper presents a
novel skyline algorithm SSPL on big data. SSPL utilizes sorted positional index
lists which require low space overhead to reduce I/O cost significantly. The sorted
positional index list Lj is constructed for each attribute Aj and is arranged in
ascending order of Aj. SSPL consists of two phases. In phase 1, SSPL computes
scan depth of the involved sorted positional index lists. During retrieving the lists
in a round-robin fashion, SSPL performs pruning on any candidate positional index
to discard the candidate whose corresponding tuple is not skyline result. Phase 1
ends when there is a candidate positional index seen in all of the involved lists. In
phase 2, SSPL exploits the obtained candidate positional indexes to
get skyline results by a selective and sequential scan on the table.
DISADVANTAGES:
1) It cannot achieve better efficiency
DISADVANTAGES:
1. The Computation Overhead is greatly affected by the size of dictionary and
the number of documents, and almost has no relation to the number of query
keywords
Proposed System:
Our proposed system describe with layered indexing to organize the tuples
into multiple consecutive layers. The top-k results can be computed by at most k
layers of tuples. Also our propose layer-based Pareto-Based Dominant Graph to
express the dominant relationship between records and top-k query is implemented
as a graph traversal problem.
tuples. propose the Hybrid- Layer Index, which integrates layer level filtering and
list-level filtering to significantly reduce the number of tuples retrieved in query
processing propose view-based algorithms to pre-construct the specified
materialized views according to some ranking functions.
Given a top-k query, one or more optimal materialized views are selected to
return the top-k results efficiently. Propose LPTA+ to significantly improve
efficiency of the state-of-the-art LPTA algorithm. The materialized views are
cached in memory; LPTA+ can reduce the iterative calling of the linear
programming sub-procedure, thus greatly improving the efficiency over the LPTA
algorithm. In practical applications, a concrete index (or view) is built on a specific
subset of attributes. Due to prohibitively expensive overhead to cover all attribute
combinations, the indexes (or views) can only be built on a small and selective set
of attribute combinations.
Correspondingly, T2S only builds presorted table, on which top-k query on any
attribute combination can be dealt with. This reduces the space overhead
Advantages:
OVERVIEW OF MICROSOFT.NET
.NET represents Microsoft's vision of the future of applications in the Internet
age. .NET provides enhanced interoperability features based upon open Internet
standards. Microsoft .NET represents a great improvement.
The .NET Framework consists of the CLR, the .NET Framework Class Library, the
Common Language Specification (CLS), a number of .NET languages, and Visual
Studio .NET.
Not all languages expose all the features of the CLR. The language with the best
mapping 45 to the CLR is the new language C#. VB.NET, however, does an
admirable job of exposing the functionality.
The .NET Framework class library is huge, comprising more than 2,500 classes.
All this functionality is available to all the .NET languages. The library consists of
four main parts:
1.
Base class library (which includes networking, security, diagnostics, I/O, and
other
Types of operating system services)
2. Data and XML classes
3. Windows UI
4. Web services and Web UI
The CLS is an agreement among language designers and class library designers
about those features and usage conventions that can be relied upon. CLS rules
apply to public features that are visible outside the assembly where they are
defined.
Languages in .NET
security
management,
network
communications;
thread
ASP.NET
Enhanced Performance
Because ASP.NET is based on the common language runtime, the power and
flexibility of that entire platform is available to Web application developers. The
.NET Framework class library, Messaging, and Data Access solutions are all
seamlessly accessible from the Web. ASP.NET is also language-independent, so
you can choose the language that best applies to your application or partition your
application across many languages.
Simplicity
ASP.NET makes it easy to perform common tasks, from simple form submission
and client authentication to deployment and site configuration. For example, the
ASP.NET page framework allows you to build user interfaces that cleanly separate
application logic from presentation code and to handle events in a simple, Visual
Basic - like forms processing model. Additionally, the common language runtime
simplifies development, with managed code services such as automatic reference
counting and garbage collection
Manageability
philosophy
extends
to
deploying
ASP.NET
Framework
ASP.NET has been designed with scalability in mind, with features specifically
tailored to improve performance in clustered and multiprocessor environments.
Further, processes are closely monitored and managed by the ASP.NET runtime, so
that if one misbehaves (leaks, deadlocks), a new process can be created in its place,
which helps keep your applications constantly available to handle requests
Security
Language Support
The Microsoft .NET Platform currently offers built-in support for three languages:
C#, Visual Basic, and Scripts.
Language Compatibility
The differences between the VBScript used in ASP and the Visual Basic .NET
language used in ASP.NET are by far the most extensive of all the potential
migration issues. Not only has ASP.NET departed from the VBScript language to
"true" Visual Basic, but the Visual Basic language itself has undergone significant
changes in this release.
Visual Basic.Net is designed to be a fast and easy way to create .NET applications,
including Web services and ASP.NET Web applications. Applications written in
Visual Basic are built on the services of the common language runtime and take
full advantage of the .NET Framework.
It is fully integrated with the .NET Framework and the common language
runtime,1 which together provide language interoperability, garbage collection,
enhanced security, and improved versioning support.
These tools through rich data analysis and data mining capabilities that integrate
with familiar applications such as Microsoft Office, SQL Server 2005 enable you
to provide all of your employees with critical, timely business information tailored
to their specific information needs. Every copy of SQL Server 2005 ships with a
suite of BI services.
Unlike its competitors, SQL Server 2005 provides a powerful and comprehensive
data management platform. Every software license includes extensive management
and development tools, a powerful extraction, transformation, and loading (ETL)
tool, business intelligence and analysis services, and new capabilities such as
Notification Services. The result is the best overall business value available.
Enterprise Edition includes the complete set of SQL Server data management and
analysis features and is uniquely characterized by several features that make it the
most scalable and available edition of SQL Server 2005. It scales to the
performance levels required to support the largest Web sites, Enterprise Online
Transaction Processing (OLTP) systems and Data Warehousing systems. Its
support for failover clustering also makes it ideal for any mission critical line-ofbusiness application.
The next major enhancement in SQL Server 2005 is the integration of a .NET
compliant language such as C#, ASP.NET or VB.NET to build objects (stored
procedures, triggers, functions, etc.). This enables you to execute .NET code in the
DBMS to take advantage of the .NET functionality. It is expected to replace
extended stored procedures in the SQL Server 2000 environment as well as expand
the traditional relational engine capabilities.
3. Service Broker
The Service Broker handles messaging between a sender and receiver in a loosely
coupled manner. A message is sent, processed and responded to, completing the
transaction. This greatly expands the capabilities of data-driven applications to
meet workflow or custom business needs.
4. Data encryption
SQL Server 2000 had no documented or publicly supported functions to encrypt
data in a table natively. Organizations had to rely on third-party products to address
this need. SQL Server 2005 has native capabilities to support encryption of data
stored in user-defined databases.
5. SMTP mail
Sending mail directly from SQL Server 2000 is possible, but challenging. With
SQL Server 2005, Microsoft incorporates SMTP mail to improve the native mail
capabilities. Say "see-ya" to Outlook on SQL Server!
6. HTTP endpoints
You can easily create HTTP endpoints via a simple T-SQL statement exposing an
object that can be accessed over the Internet. This allows a simple object to be
called across the Internet for the needed data.
7. Multiple Active Result Sets (MARS)
MARS allow a persistent database connection from a single client to have more
than one active request per connection. This should be a major performance
improvement, allowing developers to give users new capabilities when working
with SQL Server. For example, it allows multiple searches, or a search and data
entry. The bottom line is that one client connection can have multiple active
processes simultaneously.
8. Dedicated administrator connection
If all else fails, stop the SQL Server service or push the power button. That
mentality is finished with the dedicated administrator connection. This
functionality will allow a DBA to make a single diagnostic connection to SQL
Server even if the server is having an issue.
9. SQL Server Integration Services (SSIS)
SSIS has replaced DTS (Data Transformation Services) as the primary ETL
(Extraction, Transformation and Loading) tool and ships with SQL Server free of
charge. This tool, completely rewritten since SQL Server 2000, now has a great
deal of flexibility to address complex data movement.
10. Database mirroring
It's not expected to be released with SQL Server 2005 at the RTM in November,
but I think this feature has great potential. Database mirroring is an extension of
the native high-availability capabilities. So, stay tuned for more details.
Internet was originally established to meet the research needs of the U.S Defence
Industry. But it has grown into a huge global network serving universities,
academic researches, commercial interest and Government agencies, both in the
U.S and Overseas. The Internet uses TCP/IP protocols and many of the Internet
hosts run the Unix Operating System.
HTML
HTML (Hyper Text Markup Language) is the language that is used to prepare
documents for online publications. HTML documents are also called Web
documents, and each HTML document is known as Web page.
A page is what is seen in the browser at any time. Each Web site, whether on the
Internet or Intranet, is composed of multiple pages. And it is possible to switch
among them by following hyperlinks. The collection of HTML pages makes up the
World Wide Web.
A web pages is basically a text file that contains the text to be displayed and
references of elements such as images, sounds and of course hyperlinks to other
documents. HTML pages can be created using simple text editor such as Notepad
or a WYSIWYG application such as Microsoft FrontPage.
In either case the result is a plain text file that computers can easily exchange. The
browser displays this text file on the client computer.
"Hypertext" is the jumping frog portion. A hyperlink can jump to any place within
your own page(s) or literally to anyplace in the world with a 'net address (URL, or
Uniform Resource Locator.) It's a small part of the html language.
Tools and functionality, IIS also has built-in capabilities to help administer secure
websites, and to develop server-intensive web application.
FEATURES OF IIS:
IIS provides integrated security and access to a wide range of content, work
seamlessly with COM components, and has a graphical interface-the Microsoft
Management Console (MMC) that you can use to create and manage your ASP
application.
You can control many parts of IIS using COM>IIS exposes many of the servers
configuration settings via the IIS Admin objects. These objects are accessible from
ASP and other languages. That means you can adjust server configuration and
create virtual directories and webs programmatically. IIS 4 and higher store
settings and web information in a spoil database called the Metaphase. You can use
the IIS Admin objects to create new sites and virtual directories be alter the
properties of existing sites and virtual directories.
IIS ARCHITECTURES OVERVIEW:
IIS is a core product, which means that it is designed to work closely with many
other products, including all products in the Windows NT Server 4.0 Option pack.
The following figure shows the relationship between IIS and other products
installed as part of the Windows NT Server 4.0 Option pack.
SECURITY FOR IIS APPLICATION
IIS provides three authentication schemes to control access to ITS resources:
Anonymous, Basic and Windows NT challenge/Response. Each of these schemes
had different effect on the security context of an application launched by ITS. This
includes ISAPI extension agents, COT applications, IDC scripts and future
scripting capabilities.
ACCESS PRIVIEGES
IIS provides several new access levels. The following values can set the type of
access allowed to specific directories:
Read
Write
Script
Execute
Log Access
Directory Browsing.
Administering websites can be time consuming and costly, especially for people
who manage large internet Service Provider (ISP) Installations. To save time and
money Sips support only large company web siesta the expense of personal
websites. But is there a cost-effective way to support both? The answer is yes; if
you can automate administrative tasks and let users administer their own sites from
remote computers. This solution reduces the amount of time and money it takes to
manually administer a large installation, without reducing the number of web sites
supported.
2) Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as
representation, specification languages and tools, and checking the specifications
are addressed during this activity.
The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.
Role of SRS:
The purpose of the Software Requirement Specification is to reduce the
communication gap between the clients and the developers. Software Requirement
Specification is the medium though which the client and user needs are accurately
specified. It forms the basis of software development. A good SRS should satisfy
all the parties involved in the system.
User registration
User details
Key generation
Encryption
Level 1
User login
User
Cloud
Service
Provider
Level 2
User upload data into csp
User Upload data
CSP
User
Level 3
User
retrieve
data
from
csp
CSP
User
Key verification
Key generation
Key verification
Class Diagram
Data owner
Csp name
User name
Company name
address
Contact details
Upload files()
Download files()
server
Maintain file details
Maintain key details
Verify key ()
Encrypt files()
Decrypt files()
Activity diagram
Modules:
Multi-keyword ranked search:
To design search schemes which allow multi-keyword query and provide
result similarity ranking for effective data retrieval, instead of returning
undifferentiated results.
Privacy-preserving:
To prevent the cloud server from learning additional information from the
data set and the index, and to meet privacy requirements. if the cloud server
deduces any association between keywords and encrypted documents from index,
it may learn the major subject of a document, even the content of a short document.
Therefore, the searchable index should be constructed to prevent the cloud server
from performing such kind of association attack.
Efficiency:
Above goals on functionality and privacy should be achieved with low
communication and computation overhead. Assume the number of query keywords
appearing in a document the final similarity score is a linear function of xi, where
the coefficient r is set as a positive random number. However, because the random
factor "i is introduced as a part of the similarity score, the final search result on the
basis of sorting similarity scores may not be as accurate as that in original scheme.
For the consideration of search accuracy, we can let follow a normal distribution
where the standard deviation functions as a flexible tradeoff parameter among
search accuracy and security.
SYSTEM TESTING
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.
Integration testing
Functional test
Invalid Input
Functions
Output
System Test
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration oriented system integration test.
System testing is based on process descriptions and flows, emphasizing pre-driven
process links and integration points.
Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit
testing to be conducted as two distinct phases.
Test objectives
All field entries must work properly.
Pages must be activated from the identified link.
The entry screen, messages and responses must not be delayed.
Features to be tested
Verify that the entries are of the correct format
No duplicate entries should be allowed
Integration Testing
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
Acceptance Testing
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
Conclusion:
The proposed novel T2S algorithm successfully implemented and to
efficiently return top-k results on massive data by sequentially scanning the
presorted table, in which the tuples are arranged in the order of round-robin
retrieval on sorted lists. Only fixed number of candidates needs to be maintained in
T2S. This paper proposes early termination checking and the analysis of the scan
depth. Selective retrieval is devised in T2S and it is analyzed that most of the
candidates in the presorted table can be skipped. The experimental results show
that T2S significantly outperforms the existing algorithm.
Future Enhancement:
In future development of Multi keyword ranked search scheme should
explore checking the integrity of the rank order in the search result from the un
trusted network server infrastructure.
Feature Enhancement:
A novel table-scan-based T2S algorithm implemented successfully to
compute top-k results on massive data efficiently. Given table T, T2Sfirst presorts
T to obtain table PT(Presorted Table), whose tuples are arranged in the order of the
round robin retrieval on the sorted lists. During its execution, T2S only maintains
fixed and small number of tuples to compute results. It is proved that T2S has the
Characteristic of early termination. It does not need to examine all tuples in PT to
return results.