Академический Документы
Профессиональный Документы
Культура Документы
1. INTRODUCTION
1.1
documents very efficiently and which limits the number of returned documents by performing an intelligent search procedure. The purpose is to design a system that displays only relevant information to the user, by suppressing unnecessary and irrelevant information. Suppose there is a store of documents and a person (user of the store) formulates a question (request or query) to which the answer is a set of documents satisfying the information need expressed by his question. Solution: User can read all the documents in the store retain the relevant documents and discard all the others Perfect Retrieval.
Document and query indexing How to best represent their contents? Query evaluation (or retrieval process)
1
Restrained Netting 08M21AO5B5
To what extent does a document correspond to a query? System evaluation How good is a system? Are the retrieved documents relevant? (precision) Are all the relevant documents retrieved? (recall) Input: The main problem here is to obtain a Representation of each Document and Query suitable for a computer to use. Most Computer-Based Retrieval Systems store only a representation of the Document (or Query) Implies actual text is lost, an artificial language used instead. User needs to be taught to express his information need in the language. The main problem here is to obtain a Representation of each Document and Query suitable for a computer to use. Most Computer-Based Retrieval Systems store only a representation of the Document (or Query) Implies actual text is lost, an artificial language used instead. User needs to be taught to express his information need in the language. The document representative consists of a list of class names, each name representing a class of words occurring in the total input text. A document will be indexed by a name if one of its significant words occurs as a member of that class. Text Processing System: Such system will consist of three parts: Removal of high frequency words Suffix stripping Detecting equivalent stems Removal of High Freq words: One way of implementing Luhns upper cut-off. Maintain list of stop list; compare and remove
2
Restrained Netting 08M21AO5B5
Document size reduces by 30 to 50 % Suffix stripping more involved Complete list of suffixes; match and remove the longest possible one. Context free removal leads to Error : Removing UAL from FACTUAL and EQUAL Solution : Have some rules Equivalent Stems: Map to same morphological form on removal of suffixes. Other kinds, which do not match on mere removal of suffixes. (ABSORB- and ABSORPT-) For these, a list of equivalent stem-endings is maintained. (For e.g. B and PT are equivalent stem ending) The final output from a conflation algorithm is a set of classes, one for each stem detected. A class name is assigned to a document if and only if one of its members occurs as a significant word in the text of the document. A document representative then becomes a list of class names. These are often referred to as the documents index terms or keywords. Queries: Queries are handled in the same way.
1.2.
EXISTING SYSTEM
Traditional Restrained Netting consult databases of the most frequently used words in documents, such as words drawn from documents title and first few sentences, hence they won't retrieve documents in which the keywords for which one is searching are buried somewhere within document. They are useful only for searching specific information in World Wide Web (WWW). Many page authors send Confined Web Spider numerous web pages containing various tricks like irrelevant title tag or repeating certain words in first few levels that are irrelevant to actual contents of the page, to boost the ratings. It might lead to situation where in not even one of the top ten sites listed would be of subject you would expect. Anyone can put up a webpage .Results can return academic results or internet gossip. HTML doesn't provide any standard method to identify contents of documents; it is extremely difficult for Confined Web
3
Restrained Netting 08M21AO5B5
Spider to identify contents of web page to index them. As World Wide Web seems to be ever expanding, with increasing threat to quality of information available on the web
1.3.
PROPOSED SYSTEM XML (extended Markup Language) is a simplified language of the mother of all document defining language, SGML (Standardized General Markup Language ) though XML is not as powerful as SGML but much easier to use . Developing web pages using XML is much similar to HTML but provides author with ability to invent their own tags, the tag names and what they mean are left to author to define depending on subject matter. The most important thing about XML is it allows more details to be included in document, searching for specific topics should become more accurate avoiding many mismatches. This application automates the process of sending queries to these websites using advanced technology and presents the search result from all the sites to the user. It is a Confined Web Spider developed for easy search. This Confined Web Spider software is developed using state of art, high calibrated. It is very much operational with current technologies and practices. In addition, the user interface provided in this application will make user / administrator more comfortable with all the complex tools at his/her easy disposal. Implementation of the Confined Web Spider software tool in any organization website is very much practical as it doesnt demand any other external resources or components.
4
Restrained Netting 08M21AO5B5
1. Administrator Module 2. User module 3. Products module 4. Jobs module 5. Yellow pages module 6. Resume module
2.1.1 Administrator Module:
This module is about an Administrator who maintains this application. This module allows Administrator to add all objects to this application. The entire application is under control of an Administrator. The administrator has authority to add details of data which is presented in the database.
5
Restrained Netting 08M21AO5B5
6
Restrained Netting 08M21AO5B5
Restrained Netting
08M21AO5B5
Java Version
: J2SDK1.5.
Initially the language was called as oak but it was renamed as java in 1995.The primary motivation of this language was the need for a platform-independent(i.e. architecture neutral)language that could be used to create software to be embedded in various consumer electronic devices. Java is a programmers language Java is cohesive and consistent Except for those constraint imposed by the Internet environment. Java gives the programmer, full control Finally Java is to Internet Programming where c was to System Programming.
8
Restrained Netting 08M21AO5B5
concerns and by doing so, has opened the door to an exciting new form of program called the Applet.
9
Restrained Netting 08M21AO5B5
Java interpreter
SPARC Compiler
During run-time the Java interpreter tricks the byte code file into thinking that it is running on a Java Virtual Machine. In reality this could be an Intel Pentium windows 95 or sun SPARCstation running Solaris or Apple Macintosh running system and all could receive code from any computer through internet and run the Applets.
10
Restrained Netting 08M21AO5B5
A Servlet is a generic server extension. a Java class that can be loaded Dynamically to expand the functionality of a server. Servlets are commonly used with web servers. Where they can take the place CGI scripts. A servlet is similar to proprietary server extension, except that it runs inside a Java Virtual Machine (JVM) on the server, so it is safe and portable Servlets operate solely within the domain of the server. Unlike CGI and Fast CGI, which use multiple processes to handle separate program or separate requests, separate threads within web server process handle all servlets. This means that servlets are all efficient and scalable. Servlets are portable; both across operating systems and also across web servers. Java Servlets offer the best possible platform for web application development. Servlets are used as replacement for CGI scripts on a web server, they can extend any sort of server such as a mail server that allows servelts t extend its functionality perhaps by performing a virus scan on all attached documents or handling mail filtering tasks. Servlets provide a Java-based solution used to address the problems currently associated with doing server-side programming including inextensible scripting solutions platform-specific APIs and incomplete interface. Theyre faster and cleaner then CGI scripts They use a standard API( the servlet API) They provide all the advantages of Java (run on a variety of servers without needing to be rewritten)
The server environment it will be running in These quantities are important, because it allows the Servlet API to be embedded in many different kinds of servers. There are other advantages to the servelt API as well These include: Its extensible-you can inherit all your functionality from the base classes made available to you
12
Restrained Netting 08M21AO5B5
code base like http://nine.eng/classes/foo/ is required in addition to the servlets class name.Refer to the admin Gui docs on servlet section to see how to set this up. Loading Remote Servlets Remote servlets can be loaded by: Configuring the admin Tool to setup automatic loading of remote servlets. Selecting up server side include tags in .html files
13
Restrained Netting 08M21AO5B5
Servlet life cycle is highly flexible Servers have significant leeway in how they choose to support servlets .The only hard and fast rule is that a servlet engine must conform to the following life cycle contact: Create and initialize the servlets Handle zero or more service from clients Destroy the servlet and then garbage Collects it. Its perfectly legal for a servlet be loaded, created an initialized in its own JVM, only to be destroyed an garbage collected without handling any client request or after handling just one request The most common and most sensible life cycle implementations for HTTP servelts are: Single java virtual machine and astatine persistence.
registering and lagging in as a necessary evil when they are accessing sensitive information, but its all overkill for simple session tracking .Other problem with applicant authorization is that a applicant cannot simultaneously maintain more than one session at the same site.
3.3 JDBC
o o o
Native-API party-java driver JDBC-Net pure java driver Native-protocol pure Java driver
An individual database system is accessed via a specific JDBC driver that implements the java.sql. Driver interface. Drivers exist for nearly all-popular RDBMS systems, through few are available for free. Sun bundles a free JDBC-ODBC bridge driver with the JDK to allow access to a standard ODBC, data sources, such as a Microsoft Access database, Sun advises against using the bridge driver for anything other than development and very limited development. JDBC drivers are available for most database platforms, from a number of vendors and in a number of different flavors.
3.4 XML
16
Restrained Netting 08M21AO5B5
How Can XML be Used? XML is used in many aspects of web development, often to simplify data storage and sharing
17
Restrained Netting 08M21AO5B5
XHTML WSDL for describing available web services WAP and WML as markup languages for handheld devices RSS languages for news feeds RDF and OWL for describing resources and ontology SMIL for describing multimedia for the web The first line is the XML declaration. It defines the XML version (1.0) and the encoding used (ISO-8859-1 = Latin-1/West European character set). The next line describes the root element of the document (like saying: "this document is a note"): And finally the last line defines the end of the root element:
3.5 HTML
Hypertext Markup Language(HTML), the languages of the world wide web(WWW), allows applicants to produces web pages that included text, graphics and pointer to other web pages (Hyperlinks).
18
Restrained Netting 08M21AO5B5
HTML is not a programming language but it is an application of ISO Standard 8879,SGML(Standard Generalized Markup Language),but Specialized to hypertext and adapted to the Web. The idea behind Hypertext one point to another point. We can navigate through the information based on out interest and preference. A markup language is simply a series of items enclosed within the elements should be displayed. Hyperlinks are underlined or emphasized works that load to other documents or some portions of the same document. Html can be used to display any type of document on the host computer, which can be geographically at a different location. It is a versatile language and can be used on any platform or desktop HTML provides tags(special codes) to make the document look attractive. HTML provides are not case-sensitive. Using graphics, fonts, different sizes, color, etc.. can enhance the presentation of the document. Anything That is not a tag is part of the document itself.
3.5.2 ADVANTAGES
A HTML document is small and hence easy to send over the net.It is small because it does not include formatted information. HTML is platform independent HTML tags are not case-sensitive.
19
Restrained Netting 08M21AO5B5
In a client application for Navigator, JavaScript statements embedded in an HTML Page can recognize and respond to applicant events such as mouse clicks form Input, and page navigation. For example, you can write a JavaScript function to verify that applicants enter valid information into a form requesting a telephone number or zip code . Without any network transmission, an Html page with embedded Java Script can interpret the entered text and alert the applicant with a message dialog if the input is invalid or you can use JavaScript to perform an action (such as play an audio file, execute an applet, or communicate with a plug-in) in response to the applicant opening or exiting a page.
4.1.1 INTRODUCTION
System design is the process or art of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. One could see it as the application of systems theory to product development. There is some overlap and synergy with the disciplines of systems analysis, systems architecture and systems engineering. Systems design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. One could see it as the application of systems theory to product development. There is some overlap with the disciplines of systems analysis, systems architecture and systems engineering. If the broader topic of product development "blends the perspective of marketing, design, and manufacturing into a single approach to product development," then design is the act of taking the marketing information and creating the design of the product to be manufactured. Systems design is therefore the process of defining and developing systems to satisfy specified requirements of the applicant. Until the 1990s systems design had a crucial and respected role in the data processing industry. In the 1990s standardization of hardware and software resulted in the ability to build modular systems. The increasing importance of software running on generic platforms has enhanced the discipline of software engineering. Object-oriented analysis and design methods are becoming the most widely used methods for computer systems design. The UML has become the standard language in object-oriented
20
Restrained Netting 08M21AO5B5
analysis and design. It is widely used for modeling software systems and is increasingly used for high designing non-software systems and organizations.
4.1.3Physical design
The physical design relates to the actual input and output processes of the system. This is laid down in terms of how data is input into a system, how it is verified/authenticated, how it is processed, and how it is displayed as output. Physical design, in this context, does not refer to the tangible physical design of an information system. To use an analogy, a personal computer's physical design involves input via a keyboard, processing within the CPU, and output via a monitor, printer, etc. It would not concern the actual layout of the tangible hardware, which for a PC would be a monitor, CPU, motherboard, hard drive, modems, video/graphics cards, USB slots, etc. it involve detail design of applicant and a product database structure processing and control processor .The H/S personal specification a develop for the proposed system .
Context Diagram:
S E End User
A R C H
08M21AO5B5
websites database
Password info
Admin
Search Process
Password modify
22
Restrained Netting 08M21AO5B5
Search Result
Password info
websites Database
Websites database
Keyword User
Search process
Search Result
23
Restrained Netting 08M21AO5B5
Administrator
Password Info
Validation
Valid
Websites
i. This view represents the system from the applicants perspective. ii. The analysis representation describes a usage scenario from the end-applicants perspective. y Structural model view
i. In this model the data and functionality are arrived from inside the system. ii. This model view models the static structures. y Behavioral Model View It represents the dynamic of behavioral as parts of the system, depicting the interactions of collection between various structural elements described in the applicant model and structural model view. y Implementation Model View
25
Restrained Netting 08M21AO5B5
In this the structural and behavioral as parts of the system are represented as they are to be built. y Environmental Model View In this the structural and behavioral aspects of the environment in which the system is to be implemented are represented. UML is specifically constructed through two different domains they are: UML Analysis modeling, this focuses on the applicant model and structural model views of the system. UML design modeling, which focuses on the behavioral modeling, implementation of modeling.
26
Restrained Netting 08M21AO5B5
In this diagram one actor that is admin and fourteen use cases that are login, address, product, job, yellow pages, resumes, logout, adding address, deleting address, add product, modify product, delete product, add jobs and delete jobs. Admin has association relationship with all the usecases
27
Restrained Netting 08M21AO5B5
In this diagram one actor that is admin and thirteen use cases that are registrations, login, change password, advance searching , select topic, search, mailing, banking, component, careers, product, website url, search keyword . Admin has association relationship with all the usecases.
4.3.3Class Diagram:
28
Restrained Netting 08M21AO5B5
In this class diagram we have twenty four classes that are GUI component, menu, input screen, option screen, report, data store, urls information, global info, administrator information, add reports, user information, keyword information, address, data manipulation, registration, change pwd, option , product , yellow pages, login, validation, address, products, jobs , and resumes. GUI has association relationship with input screen, optional screen and reports. GUI has dependency relationship with data manipulation. Data store has dependency relationship with data manipulation.
29
Restrained Netting 08M21AO5B5
In this activity diagram, initial state and final state are used between which action states are used that are register, login, customer options, options, post resumes, view jobs, view product, advanced searching, topic wise search, yellow pages, view jobs and a rhombus is used which indicates whether the customer is valid or not. If he is valid following operations take place, if not it will not be validated.
In this sequence diagram, eight objects are used that are login, menu, address, products, jobs, yellow pages, resume, sign out, Administrator has login with user name and performed the operations like invalidation, adding urls, adding new product info, adding new category job, back to menu, adding new address of city, back home, verifying resumes and add in application, returning to home page
31
Restrained Netting 08M21AO5B5
In this collaboration diagram, eight objects are used that are login, menu, address, products, jobs, yellow pages, resume, sign out, Administrator has login with user name and performed the operations like invalidation, adding urls, adding new product info, adding new category job, back to menu, adding new address of city, back home, verifying resumes and add in application, returning to home page
32
Restrained Netting 08M21AO5B5
In this diagram, three actions states that are state unauthentication, validation, authentication. It explains the flow of events between the action states that are user id , password, if the process is valid then it proceeds else it goes for validation again.
33
Restrained Netting 08M21AO5B5
measure employed during software development. During software development. During testing, the program is executed with a set of test cases and the output of the program for the test cases is evaluated to determine if the program is performing as it is expected to perform.
5.2.1Unit Testing
Unit Testing is done on individual modules as they are completed and become executable. It is confined only to the designer's requirements. Each module can be tested using the following two Strategies:
Initialization and termination errors. In this testing only the output is checked for correctness. The logical flow of the data is not checked.
35
Restrained Netting 08M21AO5B5
5.2.10 Validation
The system has been tested and implemented successfully and thus ensured that all the requirements as listed in the software requirements specification are completely fulfilled. In case of erroneous input corresponding error messages are displayed
36
Restrained Netting 08M21AO5B5
CHAPTER-6 6.RESULTS
SCREENS:
Introduction page
37
Restrained Netting 08M21AO5B5
Advanced panel
38
Restrained Netting 08M21AO5B5
Product panel
39
Restrained Netting 08M21AO5B5
40
Restrained Netting 08M21AO5B5
Job panel
41
Restrained Netting 08M21AO5B5
42
Restrained Netting 08M21AO5B5
43
Restrained Netting 08M21AO5B5
Yellow pages
44
Restrained Netting 08M21AO5B5
45
Restrained Netting 08M21AO5B5
Placement panel
46
Restrained Netting 08M21AO5B5
Post Resume
47
Restrained Netting 08M21AO5B5
Login of applicant
48
Restrained Netting 08M21AO5B5
49
Restrained Netting 08M21AO5B5
Registration completed
50
Restrained Netting 08M21AO5B5
Administrator login
51
Restrained Netting 08M21AO5B5
Retrieve password
52
Restrained Netting 08M21AO5B5
Attaching Resume
53
Restrained Netting 08M21AO5B5
Post of Resume
54
Restrained Netting 08M21AO5B5
55
Restrained Netting 08M21AO5B5
Adding up of categories
56
Restrained Netting 08M21AO5B5
57
Restrained Netting 08M21AO5B5
58
Restrained Netting 08M21AO5B5
Adding of Resumes
59
Restrained Netting 08M21AO5B5
CHAPTER 7
It has been a great pleasure for me to work on this exciting and challenging project. This project proved good for me as it provided practical knowledge to develop a system that retrieves information and documents very efficiently and which limits the number of returned documents by performing an intelligent search procedure. The purpose is to design a system that displays only relevant information to the applicant, by suppressing unnecessary and irrelevant information. It also provides knowledge about the latest technology used in developing web enabled application and client server technology that will be great demand in future. This will provide better opportunities and guidance in future in developing projects independently.
BENEFITS: The project is identified by the merits of the system offered to the applicant. The merits of this project are as follows: y y Its a web-enabled project. This project offers applicant to enter the data through simple and interactive forms. This is very helpful for the client to enter the desired information through so much simplicity. y The applicant is mainly more concerned about the validity of the data, whatever he is entering. There are checks on every stages of any new creation, data entry or updation so that the applicant cannot enter the invalid data, which can create problems at later date. y Decision making process would be greatly enhanced because of faster processing of information since data collection from information available on computer takes much less time than manual system.
60
Restrained Netting 08M21AO5B5
CHAPTER -8 8.BIBLIOGRAPHY
R e fe r e nc e s
fo r
t he
P r o je c t
D e ve lo p m e nt
w er e
Taken
Fro m
t he
fo l lo w i n g B o o k s a nd We b S it e s .
[1] JAVA Technologies: By Doug Tidwell [2] JAVA Complete Reference: By Herbert Shildt [3] Java Script Programming : By Yehuda Shiran [4] JAVA2 Networking : By Pistoria [5] JAVA Security : By Scotl oaks [6] J2EE : J2EE Professional by Shadab siddiqui [7] JAVA server pages : By Larne Pekowsley [8] HTML : The Complete Reference by Thomas A. Powell [9] JDBC : Java Database Programming with JDBC by Patel moss. [10] Software Engineering : By Roger Pressman
61
Restrained Netting 08M21AO5B5