A Communication Perspective On Automatic Text Categorization

A COMMUNICATION PERSPECTIVE ON AUTOMATIC TEXT CATEGORIZATION
CHAPTER 1 INTRODUCTION
1.1 PROBLEM DEFINITION The basic concern of a Communication System is to transfer information from its source to a destination some distance away. Textual documents also deal with the transmission of information. Particularly, from a text categorization system point of view, the information encoded by a document is the topic or category it belongs to. Following this initial intuition, a theoretical framework is developed where Automatic Text Categorization (ATC) is studied under a Communication System perspective. Under this approach, the problematic indexing feature space dimensionality reduction has been tackled by a two-level supervised scheme, implemented by a noisy terms filtering and a subsequent redundant terms compression. Gaussian probabilistic categorizers have been revisited and adapted to the concomitance of sparsity in ATC. Experimental results pertaining to 20 Newsgroups and Reuters-21578 collections validate the theoretical approaches. The noise filter and redundancy compressor allows an aggressive term vocabulary reduction (reduction factor greater than 0.99) with a minimum loss (lower than 3 percent) and, in some cases, gain (greater than 4 percent) of final classification accuracy. The adapted Gaussian Naive Bayes classifier reaches classification results similar to those obtained by state-of-the-art Multinomial Naive Bayes (MNB) and Support Vector Machines (SVMs). 1.2 INTRODUCTION A deep parallelism may be established between a Communication System and an Automatic Text Categorization (ATC) scheme, since both disciplines deal with the transmission of information and its reliable recovery. The establishment of this novel simile allows to tackle the over dimensioned document representation space that is heavily redundant with the classification task and typically turns problematic to many categorizers [1] in ATC,1 from a founded Communication theoretical point of view. The main objective of our research has been to investigate how and up to which extreme the document representation space can be compressed
and what are the effects on final classification of this compression. The idea behind is to set a first step toward an optimal encoding of the category, carried by the document vectorial representation, in view of both limiting the greedy use of resources issued from the high-dimensionality feature space and reducing the effects of over fitting. Additionally, our research also aims at showing how the document decoding (or classification task) can take advantage of common Gaussian assumptions made in the Communication System discipline but fairly ignored in ATC
CHAPTER 2 SYSTEM ANALYSIS Systems analysis is an explicit formal inquiry carried out to help someone (referred to as the decision maker) identify a better course of action and make a better decision than he might otherwise have made. The characteristic attributes of a problem situation where systems analysis is called upon are complexity of the issue and uncertainty of the outcome of any course of action that might reasonably be taken. Systems analysis usually has some combination of the following: identification and re-identification) of objectives, constraints, and alternative courses of action; examination of the probable consequences of the alternatives in terms of costs, benefits, and risks; presentation of the results in a comparative framework so that the decision maker can make an informed choice from among the alternatives. In System Analysis more emphasis is given to understanding the details of an existing system or a proposed one and then deciding whether the proposed system is desirable or not and whether the existing system needs improvements. 2.1 EXISTING SYSTEM High-dimensionality space cannot be deals with the efficient manner. Indexing vocabulary typically reaches tens or hundreds of thousands of terms. 2.2 PROPOSED SYSTEM Term space dimensionality is used to reduce the indexing vocabulary and faces the high dimensionality space problem Term extraction and selection is very effective by reducing the document indexing. 2.3 FEASIBILITY STUDY Feasibility study is a preliminary study undertaken to determine and document a project's viability. The results of this study are used to make a decision whether to proceed with the project. If it indeed leads to a project being approved, it will - before the real work of the proposed project
starts - be used to ascertain the likelihood of the project's success. It is an analysis of possible alternative solutions to a problem and a recommendation on the best alternative. 2.4 TECHNICAL FEASIBILITY This involves questions such as whether the technology needed for the system exists, how difficult it will be to build, and whether the firm has enough experience using that technology. The assessment is based on an outline design of system requirements in terms of Input, Output, Fields, Programs, and Procedures. This can be qualified in terms of volumes of data, trends, frequency of updating, etc.in order to give an introduction to the technical system. 2.5 ECONOMICAL FEASIBILITY This involves questions such as whether the firm can afford to build the system, whether its benefits should substantially exceed its costs, and whether the project has higher priority and profits than other projects that might use the same resources. This also includes whether the project is in the condition to fulfill all the eligibility criteria and the responsibility of both sides in case there are two parties involved in performing any project. 2.6 OPERATIONAL FEASIBILITY This involves questions such as whether the firm and it users can be able to easily operate and navigate the system, whether any special training or technical knowledge is needed to operate the system with ease. 2.7 PROJECT PLANNING The purpose of Software Project Planning is to establish reasonable plans for performing the software engineering and for managing the software project. Software Project Planning involves developing estimates for the work to be performed, establishing the necessary commitments, and defining the plan to perform the work. The software planning begins with a statement of the work to be performed and other constraints and goals that define and bound the software project.
This plan is developed at the beginning of the software project and is continually refined and improved as the work processes. 2.8 PROJECT SCHEDULING The project schedule is the core of the project plan. It is used by the project manager to commit people to the project and show the organization how the work will be performed. Schedules are used to communicate final deadlines and, in some cases, to determine resource needs. They are also used as a kind of checklist to make sure that every task necessary is performed. If a task is on the schedule, the team is committed to doing it. In other words, the project schedule is the means by which the project manager brings the team and the project under control.
CHAPTER - 3 DEVELOPMENT ENVIRONMENT 3.1 HARDWARE CONFIGURATION
S.NO
1 2 3 4 5
HARDWARE
Operating System RAM Processor (with Speed) Hard Disk Size Monitor
CONFIGURATIONS
Windows 2000 & XP 1GB Intel Pentium IV (3.0 GHz) and Upwards 40 GB and above 15 CRT
3.2 SOFTWARE CONFIGURATION
S.NO
1 2 3 4 5
SOFTWARE
Platform Framework Language Front End Back End
CONFIGURATIONS
Microsoft Visual Studio .Net Framework 2.0 C#.Net Windows application SQL Server 2005
3.3 SOFTWARE SPECIFICATION 3.3.1 Introducing the .NET Framework The .NET Framework is such a comprehensive platform that it can be a little difficult to describe. I have heard it described as a Development Platform, an Execution Environment, and an Operating System among other things. In fact, in some ways each of these descriptions is accurate, if not sufficiently precise. The software industry has become much more complex since the introduction of the Internet. Users have become both more sophisticated and less sophisticated at the same time. (I suspect not many individual users have undergone both metamorphoses but as a body of users this has certainly happened). Folks who had never touched a computer less than five years ago are now comfortably including the Internet in their daily lives. Meanwhile, the technophile or professional computer user has become much more advanced, as have their expectations from software. It is this collective expectation from software that drives our industry. Each time a software developer creates a successful new idea, they raise user expectations for the next new feature. In a way this has been true for years. But now software developers face the added challenge of addressing the Internet and Internet-users in many applications that in the past were largely unconnected. It is this new challenge that the .NET Framework directly addresses. 3.3.2 Code in a Highly Distributed World Software that addresses the Internet must be able to communicate. However, the Internet is not just about communication. This assumption has led the software industry down the wrong path in the past. Communication is simply the base requirement for software in an Inter-networked world. In addition to communication other features must be established. These include, security, binary compose ability and modularity (which I will discuss shortly), scalability and performance, and flexibility. Even these just scratch the surface, but they are a good start.
Here are some features that users will expect in the near future. Users will begin to expect to run code served by a server that is not limited to the abilities (or physical display window) of a browser. Users will begin to expect websites and server-side code to begin to compose themselves of data and functionality from various venders, giving the end-user flexible one-stop shopping. Users will expect their data and information to be both secured and to roam from site to site so that they dont have to type it in over and again. These are tall orders, and these are the types of requirements that are addressed by the .NET Framework. It is not possible for the requirements of the future to be addressed by a new programming language, or a new library of tools and reusable code. It is also not practical to require everyone to buy a new operating system to use that addresses the Internet directly. This is why the .NET Framework is a development environment, execution environment and Operating System. One challenge for software in a highly distributed environment (like the Internet) is the fact that many components are involved, with different needs in terms of technology. For example, client software such as a browser or custom client has different needs then a server object or database element. Developers creating large systems often have to learn a variety of programming environments and languages just to create a single product. 3.3.3 Automatic Memory Management The Common Language Runtime does more for your C# and .NET managed executable than just JIT compile them. The CLR offers automatic thread management, security management, and perhaps most importantly, memory management. Memory management is an unavoidable part of software development. Commonly memory management, to one degree or another, is implemented by the application. It is its sheer commonality combined with its potential complexity, however, that make memory management better suited as a system service.
Here are some simple things that can go wrong in software.
Your code can reference a data block that has not been initialized. This can cause instability
and cause erratic behavior in your software. Software may fail to free up a memory block after it is finished with the data. Memory leaks
can cause an application or an entire system to fail. Software may reference a memory block after it has been freed up. There may be other memory-management related bugs, but the great majority will fall under one of these main categories. Developers are increasingly taxed with complex requirements, and the mundane task of managing the memory for objects and data types can be tedious. Furthermore, when executing component code from an un-trusted source (perhaps across the internet) in your same process with your main application code you want to be absolutely certain that the un-trusted code cannot obtain access to the memory for your data. These things create the necessity for automatic memory management for managed code. All programs running under the .NET Framework or Common Language Runtime allocate memory from a managed heap. The managed heap is maintained by the CLR. It is used for all memory resources, including the space required to create instances of objects, as well as the memory required for data buffers, strings, collections, stacks and caches. The managed heap knows when a block of data is referenced by your application (or by another object in the heap), in which case that object will be left alone. But as soon as a block of memory becomes an unreferenced item, it is subject to garbage collection. Garbage collection is an automatic part of the processing of the managed heap, and happens as needed. Your code will never explicitly clean-up, delete, or free a block of memory, so therefore it is impossible to leak memory. Memory is considered garbage when it is no longer referenced by your code, so therefore it is impossible for your code to reference a block of memory that has already been freed or garbage collected. Finally, because the managed heap is a pointer-less environment (at least from your managed codes point of view), it is possible for the code verifier to make it impossible for managed code to read a block of memory that has not been written to first.
The managed heap makes all three of the major memory management bugs an impossibility. 3.3.4 Language Concepts and the CLR Managed code runs with the constant maintenance of the Common Language Runtime. The CLR provides memory management, type management, security and threading. In this respect, the CLR is a runtime environment. However, unlike typical runtime environments, managed code is not tied to any particular programming language. You have most likely heard of C# (pronounced See-Sharp). C# is a new programming language built specifically to write managed software targeting the .NET Framework. However, C# is by no means the only language that you can use to write managed code. In fact, any compiler developer can choose to make their compiler generate managed code. The only requirement is that their compiler emits an executable comprised of valid IL and metadata. At this time Microsoft is shipping five language compilers/assemblers with the .NET Framework. These are C#, Visual Basic, C++, Java Script, and IL. (Yes, you can write managed code directly in IL, however this will be as uncommon as it is to write assembly language programs today). In addition to the five languages shipping with the framework, Microsoft will release a Java compiler that generates managed applications that run on the CLR. In addition to Microsofts language compilers, third parties are producing language compilers for over 20 computer languages, all targeting the .NET Framework. You will be able write managed applications in your favorite languages including Eiffel, PERL, COBOL and Java amongst others. Language agnosticism is really cool. Your PERL scripts will now be able to take advantage of the same object libraries that you use in your C# applications. Meanwhile, your friends and coworkers will be able to use your reusable components whether or not they are using the same programming language as you. This division of runtime engine, API (Application Programmer Interface), and language syntax is a real win for developers. The CLR does not need to know (nor will it ever know) anything about any computer language other than IL. All managed software is compiled down to IL instructions and metadata.
These are the only things that the CLR deals with. The reason this is important is because it makes any computer language an equal citizen from the point of view of the CLR. By the time JIT compilation occurs your program is nothing but logic and metadata. IL itself is geared towards object oriented languages. However, compilers for procedural or scripted languages can easily produce IL to represent their logic. 3.3.5 Advanced Topics for the Interested If you are one of those that just must know some of the details, then this section is for you. But, if you are looking for a practical but brief overview of the .NET Framework, you can skip to section Error! Reference source not found. Error! Reference source not found. right now and come back to this section when you have more time. In specific, I am going to explain in more detail JIT compilation and garbage collection. The first time that a managed executable references a class or type (such as a structure, interface, enumerated type or primitive type) the system must load the code module or managed module that implements the type. At the point of loading, the JIT compiler creates method stubs in native machine language for every member method in the newly loaded class. These stubs include nothing but a jump into a special function in the JIT compiler. Once the stub functions are created, the system fixes up any method calls in the referencing code to point to the new stub functions. At this time no JIT compilation of the types code has occurred. However, if a managed application references a managed type, it is likely to call methods on this type (in fact it is almost inevitable). When one of the stub functions is called, the JIT compiler looks up the source code (IL and metadata) in the associated managed module, and builds native machine code for the function on the fly. Then, it replaces the stub function with a jump to the newly JIT compiled function. The next time this same method is called in source code, it will be executed full speed without any need for compilation or any extra steps.
The good thing about this approach is that the system never wastes time JIT compiling methods that wont be called by this run of your application. Finally, when a method is JIT compiled, any types that it references are checked by the CLR to see if they are new to this run of the application. If this is indeed the first time a type has been referenced, then the whole process starts over again for this type. This is how JIT compilation progresses throughout the execution of a managed application. Take a deep breath, and exhale slowly, because now I am going to switch gears and discuss the garbage collector. Garbage collection is a process that takes time. The CLR must halt all or most of the threads in your managed application when garbage buffers and garbage objects are cleaned out of the managed heap. Performance is important, so it can help to understand the garbage collection process. Garbage collection is not an active process. Garbage collection is passive and will only happen when there is not enough free memory to fulfill an instruction to new-up an instance of an object or memory buffer. If there is not enough free memory then a garbage collection occurs in the attempt to find enough free memory. When garbage collection occurs, the system finds all objects referenced by local (stack) variables and global variables. These objects are not garbage, because they are referenced by your running threads. After this, the system searches referenced objects for more object references. These objects are also not garbage because they are referenced. This continues until the last referenced object is found. All other objects are garbage and are released. Object Oriented Code Reuse. Code reuse has been a goal for computer scientist for decades now. Part of the promise of object oriented programming is flexible and advanced code reuse. The CLR is a platform designed from the ground up to be object oriented, and therefore to promote all of the goals of object oriented programming.
Today, most software is written nearly from scratch. The unique logic of most applications can usually be described in several brief statements, and yet most applications include many thousands or millions of lines of custom code to achieve their goals. This cannot continue forever. In the long run the software industry will simply have too much software to write to be writing every application from scratch. Therefore systematic code reuse is a necessity. Rather than go into a lengthy explanation about why OO and code reuse are difficult-butnecessary, I would like to mention some of the rich features of the CLR that promote object oriented programming. The CLR is an object oriented platform from IL up. IL itself includes many instructions for
dealing with memory and code as objects. The CLR promotes a homogeneous view of types, where every data type in the system,
including primitive types, is an object derived from a base object type called System.Object. In this respect literally every data element in your program is an object and has certain consistent properties. Managed code has rich support for object oriented constructs such as interfaces, properties,
enumerated types and of course classes. All of these code elements are collectively referred to as types when referring to managed code. Managed code introduces new object oriented constructs including custom attributes,
advanced accessibility, and static constructors (which allow you to initialize types, rather than instances of types) to help fill in the places where other object oriented environments fall short. Managed code can make use of pre-built libraries of reusable components. These libraries
of components are called managed Assemblies and are the basic building block of binary composes ability. (Reusable components are packaged in files called assemblies, however technically even a managed executable is a managed assembly).
Binary compose ability allows your code to use other objects seamlessly without the
necessity to have or compile source code from the third party code. (This is largely possible due to the rich descriptions of code maintained in the metadata). The CLR has very strong versioning ability. Even though your applications will be
composed of many objects published in many different assemblies (files), it will not suffer from versioning problems as new versions of the various pieces are installed on a system. The CLR knows enough about an object to know exactly which version of an object is needed by a particular application. These features and more build upon and extend previous object oriented platforms. In the long run object oriented platforms like the .NET Framework will change the way applications are built. Moving forward, a larger and larger percentage of the new code that you write will directly relate to the unique aspects of your application. Meanwhile, the standard bits that show up in many applications will be published as reusable and extendible types. 3.3.6 Class Library Now that you have a taste of the goals and groundwork lay by the CLR and managed code, lets taste the fruits that it bears. The Framework Class Library is the first step toward the end solution of component based applications. If you like, you can use it like any other library or API. That is to say that you can write applications that make use of the objects in the FCL to read files, display windows, and do various tasks. But, to exploit the true possibilities, you can extend the FCL towards your applications needs, and then write a very thin layer that is just application code. The rest is reusable types and extensions of reusable types. The FCL is a class library; however it has been designed for extendibility and composes ability. This is advanced reuse. Take, for example, the stream classes in the FCL. The designers of the FCL could have defined file streams and network streams and been done with it. Instead, all stream classes are derived from a base class, called System.IO.Stream. The FCL defines two main kinds of streams: Streams that communicate with devices (such as files, networks and memory), and streams whose devices are other instances of stream derived classes. These abstracted streams
can be used for IO formatting, buffering, encryption, data compression, Base-64 encoding, or just about any other kind of data manipulation. The result of this kind of design is a simple set of classes with a simple set of rules that can be combined in a nearly infinite number of ways to produce the desired effect. Meanwhile, you can derive your own stream classes which can be composed along with the classes that ship with the Framework Class Library. The following sample applications demonstrate streams and FCL compose ability in general. 3.4 ADO.NET ADO.NET provides consistent access to data sources such as Microsoft SQL Server, as well as data sources exposed via OLE DB and XML. Data-sharing consumer applications can use ADO.NET to connect to these data sources and retrieve, manipulate, and update data. ADO.NET cleanly factors data access from data manipulation into discrete components that can be used separately or in tandem. ADO.NET includes .NET data providers for connecting to a database, executing commands, and retrieving results. Those results are either processed directly, or placed in an ADO.NET Dataset object in order to be exposed to the user in an ad-hoc manner, combined with data from multiple sources, or remote between tiers. The ADO.NET Dataset object can also be used independently of a .NET data provider to manage data local to the application or sourced from XML. 3.4.1 Need for ADO.NET As application development has evolved, new applications have become loosely coupled based on the Web application model. More and more of today's applications use XML to encode data to be passed over network connections. Web applications use HTTP as the fabric for communication between tiers, and therefore must explicitly handle maintaining state between requests. This new model is very different from the connected, tightly coupled style of programming that characterized the client/server era, where a connection was held open for the duration of the program's lifetime and no special handling of state was required.
In designing tools and technologies to meet the needs of today's developer, Microsoft recognized that an entirely new programming model for data access was needed, one that is built upon the .NET Framework. Building on the .NET Framework ensured that the data access technology would be uniformcomponents would share a common type system, design patterns, and naming conventions. ADO.NET was designed to meet the needs of this new programming model: disconnected data architecture, tight integration with XML, common data representation with the ability to combine data from multiple and varied data sources, and optimized facilities for interacting with a database, all native to the .NET Framework. 3.4.2 Leverage Current Ado Knowledge Microsoft's design for ADO.NET addresses many of the requirements of today's application development model. At the same time, the programming model stays as similar as possible to ADO, so current ADO developers do not have to start from scratch in learning a brand new data access technology. ADO.NET is an intrinsic part of the .NET Framework without seeming completely foreign to the ADO programmer. ADO.NET coexists with ADO. While most new .NET applications will be written using ADO.NET, ADO remains available to the .NET programmer through .NET COM interoperability services. For more information about the similarities and the differences between ADO.NET and ADO. ADO.NET provides first-class support for the disconnected, n-tier programming environment for which many new applications are written. The concept of working with a disconnected set of data has become a focal point in the programming model. The ADO.NET solution for n-tier programming is the Dataset.
3.4.3 XML Support XML and data access are intimately tiedXML is all about encoding data, and data access is increasingly becoming all about XML. The .NET Framework does not just support Web standardsit is built entirely on top of them. 3.5 C SHARP (C# ) That's a lot to assimilate, but all that was just the runtime engine, the foundation. Unfortunately, there are thousands of classes2 in the C# "framework classes," so I can't even begin to introduce you to what is in the framework - the best I can do is give you an idea of why you should take the trouble to learn it. The framework classes constitute the runtime library that all .Net languages and applications share. For portability between Delphi for Windows and Delphi for .Net you can just stick to the Delphi RTL wrappings for various framework features. However, to really take advantage of .Net, you should make an effort to learn the framework classes. Beyond what learning the framework classes can do for today's projects, learning the framework classes is what will make you a .Net programmer who can find work in any .Net shop on the planet. ["Learn once, work anywhere."] You've probably all seen the dog and pony shows where .Net turns all the complexity of XML, SOAP, and WSDL into straightforward remote calls that pass objects between systems. This is great stuff - but there's a lot more to the framework classes than web services. .Net includes cryptography classes, Perl-compatible regex classes, and a great suite of collection classes that goes just light years beyond TList. One thing to note is that even though C# is easy for Delphi programmers to read, you don't have to learn C# to learn the framework classes. Microsoft does not currently provide source to the library code, so that you can't Ctrl+Click on TObject.ToString and see the implementation, any more than you can Ctrl+Click on CreateCompatibleDC() in Delphi for Windows.
This is the Future Historically, the Windows API has been a set of 'flat' function calls. If you were feeling particularly charitable, you could say it was "object like", in that you created an object (like a window or a font) and then kept passing the "handle" to various routines that manipulated it. Of course, few people have ever been particularly willing to be quite so charitable. Learning the Windows API was always a slow and frustrating exercise, and almost all Windows code manipulates the flat API from behind various layers of incompatible object-oriented wrappers. Knowing MFC didn't help much with Delphi and vice versa. More, if you weren't working in C or C++, you were always working at a disadvantage. When a new API came out, you'd either have to take the time to translate the headers and maybe write some wrapper classes yourself, or you'd have to wait for someone else to do it. Either way, there was always the danger that a translation might be wrong in some way - the pad bytes are off, an optional parameter might be required, a routine might be declared with the wrong calling convention, and so on. All these problems disappear with .Net and the framework classes. The framework is object-oriented from top to bottom. No more "handles" to pass to an endless set of flat functions you work with a window or a font by setting properties and calling methods. Just like Delphi, of course - but now this is the native API, not a wrapper. The wrapper classes are organized into hierarchical namespaces, which reduce the endless searching through alphabetical lists of function names. Looking for file functions? System.IO is a pretty logical place to look. Want a hash table like in Perl? System. Collections has a pretty nice one. Finally, Microsoft promises that all future API's will be released as CLS-compliant parts of the framework class library. This means that your Delphi for .Net programs can use a new API the day it's released, without having to do any header translation, and without any danger that the header translation might be wrong. You might be skeptical about that promise. Perhaps you remember that COM was once touted as Windows' object-oriented future. This is a sensible attitude - but .Net is a lot better than COM ever was. Most people's first COM experiments produced a sort of stunned disbelief at just
how complex Microsoft had managed to make something as simple as object orientation. Most people's first .Net experiments leave them pleasantly surprised that something this good could have come from the same company that gave us COM and the Windows API. 3.6 VISUAL C# .NET OVERVIEW: Strong C++ heritage immediately familiar to C++ and Java developers Allows C-style memory management and pointers First component-oriented language in C family Properties, methods, indexers, delegates, events Design-time and runtime attributes Enables one-stop programming No header files, IDL Embeddable in ASP .NET Component-Oriented What defines a component? Properties, methods, events Design-time and runtime information Integrated help and documentation First class support in C# Not naming patterns, adapters, etc. Not external files Easy to build and consume Comparison to Visual Basic
Syntactic Differences Visual Basic is NOT case sensitive
In C# but not in Visual Basic Pointers, shift operators, inline documentation Overloaded operators, unsigned integers In Visual Basic but not in C# Select Case, Interface implementation Dynamic arrays, modules, optional parameters 3.7 Need for C# Existing languages are powerful. Why do we need another language? Important features are spread out over multiple languages Example: must choose between pointers (C++) or garbage collection (Java)? Old languages + new features = poor syntax Garbage collection in C++? Event-driven GUIs in Java?
3.8 Goals of C# Give developers a single language with A full suite of powerful features A consistent and simple syntax Increase developer productivity! Type safety Garbage collection Exceptions Leverage existing skills Support component-oriented programming First class support for component concepts such as properties, events, attributes Provide unified and extensible type system Where everything can be treated as an object Build foundation for future innovation Concise language specification Standardization 3.9 Design of C# Derived from the features and syntaxes of other languages o The safety of Java o The ease of Visual Basic o The power of C++ Uses the .NET Framework Plus several unique features
CHAPTER 4 SYSTEM DESIGN 4.1 DESIGN CONCEPTS The design of an information system produces the details that state how a system will meet the requirements identified during analysis. The emphasis is on translating the performance, requirements into design specifications. This phase is known as logical system design phase which includes the details of output, the data to be input, file structures, data structures, controls and calculation procedures. The next phase, the Physical design produces a working system.. The various steps in designing the CoPST: Cost-Based Predictive Spatio-Temporal Join are given below. The following steps are involved in design: First, decide how the output is to be produced and in what format. Second, the input data and the master files have to be designed to meet the system requirements. Finally, details related to the justification of the systems are presented. 4.2 INPUT DESIGN It is the process of converting input data to the computer-based data. The goal of designing input data is to make data entry as easier and free from error as possible. Input design determines the format and validation criteria for data entering the system. Personal computers and terminals can place a data at users fingertips, allowing them to call up specific data and make timely decisions based on the data. This system contains data collection screen which display heading the defined their purposes. By employing flashing error messages, and providing necessary alerts on the screen, mist entering of data in the system is avoided. The rule that each screen should have a single purpose and restrict itself to logically related data verification, validation etc control and reduce errors.
The accuracy of output depends on the accuracy of input and its processing. So we have to carry out the input design very carefully. The key factors to be considered while designing input are: produce a cost effective method of input, achieve the highest possible level of accuracy and is acceptable to and understand by the user using meaningful words. The application has been designed in a user-friendly manner. The forms are designed in such a way that the cursor is place in the position where data must be entered. 4.2.1 Effectiveness The input Screen forms have been designed such that they are very effective i.e., serve a specific purpose. The Advertisers Login form used gets the details of the member and stores it, thus doing the registration. 4.2.2 Accuracy The forms have been designed such that they assure proper completion. Validation been included and thus all the required fields are checked. 4.2.3 Ease of Use Forms are straight forward and require no extra time to understand. 4.2.4 Simplicity The forms are simple and uncultured. 4.2.5 Attractiveness Input forms have been designed such that it has appealing design that would please the user. 4.3 OUTPUT DESIGN Computer output is the most important and the direct source of information to the user. Efficient and intelligible output design should improve the system relationships with the user and help in decision making. Major forms of output are hard copy from the printer and soft copy from has
the CRT display. Output is the key tool to evaluate the performance of software so the designing of output should be done with great care. It should be able to satisfy the users requirements. 4.4 CODE DESIGN A group of characters used to identify an item of data is a code. A major problem encountered in working with a large amount of data is the retrieval of specific data when it is required. Codes are used to aid the user in information identification and retrieval. Large volume of data handling makes difficulty in individual identification. Code facilitated easier identification simplification in. Handling and retrieval of item by consuming less storage space. The codes are designed in such a manner that6 the user will easily understand it. In the developed system a suitable coding is adopted, which can identify each user exactly. The user is identified by a unique ID which is automatically generated and it is unique. The need to communicate with and by means of computers has made increasing demands on user to user to work with and understand computer codes instead of natural language. It must always be remembered that human beings, including people who do not have much familiarity with data processing codes should be designed with the following features, will use a code. 4.5 DATABASE DESIGN: Databases are designed using the latest software available and the development process follows the specific requirements of the Client. We provide total flexibility in terms of database design - the development process is essentially "Client driven". It is important to remember that a well-designed database should provide an end product (database) that has been tailored to meet both your professional and practical business needs and therefore serve its intended purpose. Comprehensive and detailed analysis of the business needs, Preparation of a design specifications, Initial design concept, Database programming, Database esting/validation, Client support, Client site installation and ofcourse extensive Database Developer & Client communication. CHAPTER 5
ARCHETECTURAL DETAILS 5.1 MODULES Login and Register Modify Length and History View Encoder and Forwarder Decoder and Receiver 5.2 MODULE DESCRIPTION Login and Register Login and Registration module has developed for authenticating admin to modifying length of the text and user to categorize the text. This module contains login and registration form. Register form holds the details of new user. New users have to fill their details in the register form for authenticate and to authorize text categorization. The details of user stored in the database. Login form contains username and password authenticating different user. Modify Length and History View This module has been developed for modifying the length of text by admin. In the previous module we developed login authentication for admin. The purpose of admin login is to modify the length of text by choosing three types of size there are long, medium and short. This module also contains the history view form. This form has developed to show the details of different user text categorization. History form contains user id, source, source path, encoded text and encoded path. Its shows full text categorization details of all the users. Encoder or Transmitter This module has been designed to encoding the text. By using the cluster algorithm, system encoded the text typed by the user. Whatever content enter by user is encoded and display in the encode content. This module contains two text areas to type and view the encoded text. The
encoding is done only for the small size of text to reduce the execution time. But our system can be able to encode any size of text. Decoder or Receiver This module has developed for receiving the encoded files in the local system. In this module user can save both the encoded and decoded text in same name. Receiver module contains two specific paths to save source text and encoded text. In text categorization we can view the encoded text in the runtime. Transmitter module shows the encoded text in the text area. In this module we can save the both the files in the text format. 5.3 DATA FLOW DIAGRAM LEVEL 1
User
Registration
Fill mandatory fields
LEVEL 2
User
Registration
LEVEL 3 User User and Admin Registration Login
User and Admin Login
Admin modify the length User wants to enter the text
LEVEL 4 User Registration
LEVEL 5
User gets the encode in text box
User
Registration
Save source file and enclosed file in different path
LEVEL 6 User Registration
Save source file and enclosed file in different path
Text categorized and user can view the history
5.4 DATABASE DIAGRAM
CHAPTER -6 TESTING Testing is an empirical investigation conducted to provide stakeholders with information about the quality of the product or service under test, with respect to the context in which it is intended to operate. Software Testing also provides an objective, independent view of the software to allow the business to appreciate and understand the risks at implementation of the software. Test techniques include, but are not limited to, the process of executing a program or application with the intent of finding software bugs. It can also be stated as the process of validating and verifying that a software program/application/product meets the business and technical requirements that guided its design and development, so that it works as expected and can be implemented with the same characteristics. Software Testing, depending on the testing method employed, can be implemented at any time in the development process, however the most test effort is employed after the requirements have been defined and coding process has been completed. 6.1 SOFTWARE TESTING Testing is a process of technical investigation, performed on behalf of stakeholders, that is intended to reveal quality-related information about the product with respect to the context in which it is intended to operate. This includes, but is not limited to, the process of executing a program or with the intent of finding errors. Quality is not an absolute; it is value to some person. With that in mind, testing can never completely establish the correctness of arbitrary computer software; testing furnishes a criticism or comparison that compares the state and behavior of the product against a specification. An important point is that software testing should be distinguished from the separate discipline of Software Quality Assurance (SQA), which encompasses all business process areas, not just testing. White box and black box testing are terms used to describe the point of view a test engineer takes when designing test cases. Black box being an external view of the test object and white box being an internal view. Software testing is partly intuitive, but largely systematic. Good testing involves much more than just running the program a few times to see whether it works.
Thorough analysis of the program under test, backed by a broad knowledge of testing techniques and tools are prerequisites to systematic testing. Software Testing is the process of executing software in a controlled manner; in order to answer the question Does this software behave as specified? Software testing is used in association with Verification and Validation. Verification is the checking of or testing of items, including software, for conformance and consistency with an associated specification. Software testing is just one kind of verification, which also uses techniques as reviews, inspections, walkthrough. Validation is the process of checking what has been specified is what the user actually wanted. 6.2 FUNCTIONAL TESTING In this type of testing, the software is tested for the functional requirements. The tests are written in order to check if the application behaves as expected. Although functional testing is often done toward the end of the development cycle, it canand should, be started much earlier. Individual components and processes can be tested early on, even before it's possible to do functional testing on the entire system. Functional testing covers how well the system executes the functions it is supposed to executeincluding user commands, data manipulation, searches and business processes, user screens, and integrations. Functional testing covers the obvious surface type of functions, as well as the back-end operations (such as security and how upgrades affect the system). 6.3 UNIT TESTING The developer carries out unit testing in order to check if the particular module or unit of code is working fine. The Unit Testing comes at the very basic level as it is carried out as and when the unit of the code is developed or a particular functionality is built. Unit testing deals with testing a unit as a whole. This would test the interaction of many functions but confine the test within one unit. The exact scope of a unit is left to interpretation. Supporting test code, sometimes called scaffolding, may be necessary to support an individual test. This type of testing is driven by the architecture and implementation teams.
This focus is also called black-box testing because only the details of the interface are visible to the test. Limits that are global to a unit are tested here. In the construction industry, scaffolding is a temporary, easy to assemble and disassemble, frame placed around a building to facilitate the construction of the building. The construction workers first build the scaffolding and then the building. Later the scaffolding is removed, exposing the completed building. Similarly, in software testing, one particular test may need some supporting software. This software establishes an environment around the test. Only when this environment is established can a correct evaluation of the test take place. The scaffolding software may establish state and values for data structures as well as providing dummy external functions for the test. Different scaffolding software may be needed from one test to another test. Scaffolding software rarely is considered part of the system. Sometimes the scaffolding software becomes larger than the system software being tested. Usually the scaffolding software is not of the same quality as the system software and frequently is quite fragile. A small change in the test may lead to much larger changes in the scaffolding. Internal and unit testing can be automated with the help of coverage tools. A coverage tool analyzes the source code and generates a test that will execute every alternative thread of execution. It is still up to the programmer to combine this test into meaningful cases to validate the result of each thread of execution. Typically, the coverage tool is used in a slightly different way. First the coverage tool is used to augment the source by placing informational prints after each line of code. Then the testing suite is executed generating an audit trail. This audit trail is analyzed and reports the percent of the total system code executed during the test suite. If the coverage is high and the untested source lines are of low impact to the system's overall quality, then no more additional tests are required. 6.4 SECURITY TESTING Security Testing is carried out in order to find out how well the system can protect itself from unauthorized access, hacking cracking, any code damage etc. which deals with the code of application. This type of testing needs sophisticated testing techniques.
Chapter 7 SCREEN SHOT 7.1 Login Page
7.2 Registration Page
7.3 Length Page
7.4 N- Code and D Code page
7.5 Save Encoded File
7.6 History page
CHAPTER 8 SYSTEM DIAGRAM 8.1 FLOWCHART
8.2 SEQUENTIAL DIAGRAM
8.3 ER DIAGRAM
Registration View history Login User
N code and D code the text
Save N code and D code file
Admin Login Modify the length of text
8.4 USE CASE DIAGRAM
CHAPTER 9 SAMPLE CODING using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Data.SqlClient; using System.Configuration; namespace Text_Categorization { public partial class Login : Form { string ConnStr = ConfigurationSettings.AppSettings["ConnectionString"].ToString(); Functions f = new Functions(); SqlConnection con = new SqlConnection(); DataTable dt = new DataTable(); string PassQuery, Result; int ResultInt;
public Login() { InitializeComponent(); }
private void btnLogin_Click(object sender, EventArgs e) { dt.Clear(); if (checkBox1.Checked != true) PassQuery = "SELECT * FROM USERDETAIL WHERE UN='"+ txtun.Text.Replace("'","''") +"' AND PWD='"+ TXTpWD.Text.Replace("'","''") + "'"; else PassQuery = "SELECT * FROM USERDETAIL WHERE UN='" + txtun.Text.Replace("'", "''") + "' AND PWD='" + TXTpWD.Text.Replace("'", "''") + "' AND STATUS=1"; dt = f.GetDT(con, ConnStr, PassQuery); if (dt.Rows.Count != 0) { Result = dt.Rows[0][9].ToString(); if (Result == "1") { Modify_TC AD = new Modify_TC(); AD.Show(); lbl_invalid.Visible = false; }
else if (Result == "0") { N_Code_D_Code un = new N_Code_D_Code(txtun.Text.ToString()); un.Show(); lbl_invalid.Visible = false; } else { lbl_invalid.Visible = false; } } } private void TXTpWD_TextChanged(object sender, EventArgs e) {
} private void txtun_TextChanged(object sender, EventArgs e) {
} private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e) { Registration r = new Registration();
r.Show(); } } } namespace Text_Categorization { class Functions { SqlConnection con = new SqlConnection(); SqlCommand cmd; SqlDataAdapter da; SqlDataReader dr; DataSet ds; DataTable dt = new DataTable(); int ResultStatus, Counter; string ResultStr = ""; object ReturnObj = ""; string errorText; bool bStatus; public SqlConnection DBconnect(SqlConnection conn, string Constr) { if (conn.State != ConnectionState.Open) {
conn = new SqlConnection(Constr); conn.Open(); } return conn; } public int ExecQry(SqlConnection conn, string ConnectStr, string Qry) { try { cmd = new SqlCommand(Qry, DBconnect(conn, ConnectStr)); return ResultStatus = cmd.ExecuteNonQuery(); } catch { return ResultStatus = 0; } finally { conn.Close(); conn.Dispose(); } } public object GetSingle(SqlConnection conn, string ConnectStr, string Qry) { try { cmd = new SqlCommand(Qry, DBconnect(conn, ConnectStr));
return ReturnObj = cmd.ExecuteScalar(); } catch { return ReturnObj; } finally { conn.Close(); conn.Dispose(); } } <xsd:schema id="root" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xsd:import namespace="http://www.w3.org/XML/1998/namespace" /> <xsd:element name="root" msdata:IsDataSet="true"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element name="metadata"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="xsd:string" minOccurs="0" /> </xsd:sequence> <xsd:attribute name="name" use="required" type="xsd:string" /> <xsd:attribute name="type" type="xsd:string" /> <xsd:attribute name="mimetype" type="xsd:string" /> <xsd:attribute ref="xml:space" /> </xsd:complexType>
</xsd:element> <xsd:element name="assembly"> <xsd:complexType> <xsd:attribute name="alias" type="xsd:string" /> <xsd:attribute name="name" type="xsd:string" /> </xsd:complexType> </xsd:element> <xsd:element name="data"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="xsd:string" minOccurs="0" msdata:Ordinal="1" /> <xsd:element name="comment" type="xsd:string" minOccurs="0" msdata:Ordinal="2" /> </xsd:sequence> <xsd:attribute name="name" type="xsd:string" use="required" msdata:Ordinal="1" /> <xsd:attribute name="type" type="xsd:string" msdata:Ordinal="3" /> <xsd:attribute name="mimetype" type="xsd:string" msdata:Ordinal="4" /> <xsd:attribute ref="xml:space" /> </xsd:complexType> </xsd:element> <xsd:element name="resheader"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="xsd:string" minOccurs="0" msdata:Ordinal="1" />
</xsd:sequence> <xsd:attribute name="name" type="xsd:string" use="required" /> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:complexType> </xsd:element> </xsd:schema> <resheader name="resmimetype"> <value>text/microsoft-resx</value> </resheader> <resheader name="version"> <value>2.0</value> </resheader> <resheader name="reader"> <value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value> </resheader> <resheader name="writer"> <value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value> </resheader> <metadata name="errorProvider1.TrayLocation" type="System.Drawing.Point, System.Drawing, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a"> <value>17, 17</value>
</metadata> </root> private void InitializeComponent() { this.label1 = new System.Windows.Forms.Label(); this.label10 = new System.Windows.Forms.Label(); this.label2 = new System.Windows.Forms.Label(); this.txtun = new System.Windows.Forms.TextBox(); this.TXTpWD = new System.Windows.Forms.TextBox(); this.checkBox1 = new System.Windows.Forms.CheckBox(); this.btnLogin = new System.Windows.Forms.Button(); this.lbl_invalid = new System.Windows.Forms.Label(); this.linkLabel1 = new System.Windows.Forms.LinkLabel(); this.SuspendLayout(); // // label1 // this.label1.AutoSize = true; this.label1.Font = new System.Drawing.Font("Calibri", 12F, System.Drawing.FontStyle.Bold, System.Drawing.GraphicsUnit.Point, ((byte)(0))); this.label1.ForeColor = System.Drawing.SystemColors.ActiveCaptionText; this.label1.Location = new System.Drawing.Point(51, 86); this.label1.Name = "label1"; this.label1.Size = new System.Drawing.Size(77, 19);
this.label1.TabIndex = 1; this.label1.Text = "Username"; // // label10 // this.label10.AutoSize = true; this.label10.Font = new System.Drawing.Font("Calibri", 18F, System.Drawing.FontStyle.Bold, System.Drawing.GraphicsUnit.Point, ((byte)(0))); this.label10.ForeColor = System.Drawing.SystemColors.ActiveCaptionText; this.label10.Location = new System.Drawing.Point(80, 33); this.label10.Name = "label10"; this.label10.Size = new System.Drawing.Size(135, 29); this.label10.TabIndex = 10; this.label10.Text = "Registration"; // // label2 // this.label2.AutoSize = true; this.label2.Font = new System.Drawing.Font("Calibri", 12F, System.Drawing.FontStyle.Bold, System.Drawing.GraphicsUnit.Point, ((byte)(0))); this.label2.ForeColor = System.Drawing.SystemColors.ActiveCaptionText; this.label2.Location = new System.Drawing.Point(51, 126); this.label2.Name = "label2"; this.label2.Size = new System.Drawing.Size(74, 19);
this.label2.TabIndex = 11; this.label2.Text = "Password"; // // txtun // this.txtun.Location = new System.Drawing.Point(135, 87); this.txtun.Name = "txtun"; this.txtun.Size = new System.Drawing.Size(100, 20); this.txtun.TabIndex = 13; this.txtun.TextChanged += new System.EventHandler(this.txtun_TextChanged); // // TXTpWD // this.TXTpWD.Location = new System.Drawing.Point(135, 127); this.TXTpWD.Name = "TXTpWD"; this.TXTpWD.PasswordChar = '*'; this.TXTpWD.Size = new System.Drawing.Size(100, 20); this.TXTpWD.TabIndex = 14; this.TXTpWD.TextChanged += new System.EventHandler(this.TXTpWD_TextChanged); // // checkBox1 // this.checkBox1.AutoSize = true;
this.checkBox1.Font = new System.Drawing.Font("Calibri", 9.75F, System.Drawing.FontStyle.Bold, System.Drawing.GraphicsUnit.Point, ((byte)(0))); this.checkBox1.ForeColor = System.Drawing.SystemColors.ActiveCaptionText; this.checkBox1.Location = new System.Drawing.Point(113, 207); this.checkBox1.Name = "checkBox1"; this.checkBox1.Size = new System.Drawing.Size(62, 19); this.checkBox1.TabIndex = 15; this.checkBox1.Text = "Admin"; this.checkBox1.UseVisualStyleBackColor = true; // // btnLogin // this.btnLogin.Location = new System.Drawing.Point(100, 166); this.btnLogin.Name = "btnLogin"; this.btnLogin.Size = new System.Drawing.Size(75, 23); this.btnLogin.TabIndex = 16; this.btnLogin.Text = "Login"; this.btnLogin.UseVisualStyleBackColor = true; this.btnLogin.Click += new System.EventHandler(this.btnLogin_Click); // // lbl_invalid // this.lbl_invalid.AutoSize = true;
this.lbl_invalid.Font = new System.Drawing.Font("Calibri", 9.75F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0))); this.lbl_invalid.ForeColor = System.Drawing.Color.Red; this.lbl_invalid.Location = new System.Drawing.Point(58, 238); this.lbl_invalid.Name = "lbl_invalid"; this.lbl_invalid.Size = new System.Drawing.Size(178, 15); this.lbl_invalid.TabIndex = 17; this.lbl_invalid.Text = "Invalid Username Or Password"; this.lbl_invalid.Visible = false; // // linkLabel1 // this.linkLabel1.AutoSize = true; this.linkLabel1.Location = new System.Drawing.Point(2, 9); this.linkLabel1.Name = "linkLabel1"; this.linkLabel1.Size = new System.Drawing.Size(88, 13); this.linkLabel1.TabIndex = 18; this.linkLabel1.TabStop = true; this.linkLabel1.Text = "New Registration"; this.linkLabel1.LinkClicked += new System.Windows.Forms.LinkLabelLinkClickedEventHandler(this.linkLabel1_LinkClicked); // // Login //
this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F); this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font; this.BackColor = System.Drawing.SystemColors.ControlDark; this.ClientSize = new System.Drawing.Size(292, 266); this.Controls.Add(this.linkLabel1); this.Controls.Add(this.lbl_invalid); this.Controls.Add(this.btnLogin); this.Controls.Add(this.checkBox1); this.Controls.Add(this.TXTpWD); this.Controls.Add(this.txtun); this.Controls.Add(this.label2); this.Controls.Add(this.label10); this.Controls.Add(this.label1); this.Name = "Login"; this.Text = "Text Categorization - Login"; this.ResumeLayout(false); this.PerformLayout(); } public DataTable GetDT(SqlConnection conn, string ConnectStr, string Qry) { try { da = new SqlDataAdapter(Qry, DBconnect(conn, ConnectStr));
da.Fill(dt); return dt; } catch { dt.Clear(); return dt; } finally { conn.Close(); con.Dispose(); } } public bool ErrProvider_Combo(ComboBox ComBx, ErrorProvider errorProvider1, string msg) { bStatus = true; if (ComBx.SelectedIndex == 0) { errorProvider1.SetError(ComBx, msg); bStatus = false; } else
errorProvider1.SetError(ComBx, ""); return bStatus; } USE [TC] GO /****** Object: Table [dbo].[history] Script Date: 01/10/2010 18:35:04 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[history]( [FID] [int] IDENTITY(1,1) NOT NULL, [uid] [varchar](10) NOT NULL, [SOURCE] [varchar](5000) NOT NULL, [sourcepath] [varchar](50) NULL, [NCODE] [varchar](5000) NULL, [ncodepath] [varchar](50) NULL ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO
/****** Object: Table [dbo].[userdetail] Script Date: 01/10/2010 18:35:16 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[userdetail]( [uid] [varchar](10) NOT NULL, [un] [varchar](50) NULL, [pwd] [varchar](50) NULL, [nickname] [varchar](50) NULL, [dob] [datetime] NULL, [gender] [varchar](10) NULL, [addr] [varchar](250) NULL, [mob] [varchar](15) NULL, [mail] [varchar](50) NULL, [STATUS] [varchar](50) NULL, CONSTRAINT [PK_userdetail] PRIMARY KEY CLUSTERED ( [uid] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
GO SET ANSI_PADDING OFF GO /****** Object: Table [dbo].[TC_length] Script Date: 01/10/2010 18:35:07 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[TC_length]( [s] [int] NOT NULL, [m] [int] NOT NULL, [l] [int] NOT NULL ) ON [PRIMARY] GO public bool ErrProvider_TextBox(TextBox TxtBX, ErrorProvider errorProvider1, string msg) { bool bStatus = true; if (TxtBX.Text == "") { errorProvider1.SetError(TxtBX, msg); bStatus = false; } else
errorProvider1.SetError(TxtBX, ""); return bStatus; } public bool ErrProvider_AllFields(ErrorProvider errorProvider1, Form ctr, string ErrMsg) { Counter = 0; bStatus = true; foreach (Control ctrl in ctr.Controls) { if (ctrl.Text == "") { errorProvider1.SetError(ctrl, ErrMsg); Counter++; } } if (Counter != 0) return bStatus = false; else return bStatus = true; } public void ResetFormValues(Form fom) { foreach (Control ctr in fom.Controls)
{ //if (ctr == System.Windows.Forms.TextBox) ctr.Text = ""; } } public string base64Encode(string data) { try { byte[] encData_byte = new byte[data.Length]; encData_byte = System.Text.Encoding.UTF8.GetBytes(data); string encodedData = Convert.ToBase64String(encData_byte); return encodedData; } catch (Exception e) { throw new Exception("Error in base64Encode" + e.Message); } } public string base64Decode(string data) { try {
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding(); System.Text.Decoder utf8Decode = encoder.GetDecoder(); byte[] todecode_byte = Convert.FromBase64String(data); int charCount = utf8Decode.GetCharCount(todecode_byte, 0, todecode_byte.Length); char[] decoded_char = new char[charCount]; utf8Decode.GetChars(todecode_byte, 0, todecode_byte.Length, decoded_char, 0); string result = new String(decoded_char); return result; } catch (Exception e) { throw new Exception("Error in base64Decode" + e.Message); } } public void GenerateTXTfile(string Fpath, string Content,string Fname) { try { StreamWriter SW; SW = File.CreateText(Fpath + Fname + ".txt"); SW.WriteLine(Content); SW.Close(); }
catch (Exception e) { throw new Exception("Error in File Generation" + e.Message); } } public void GetDirectory(ComboBox comboBox1) { DriveInfo[] drives = DriveInfo.GetDrives(); for (int i = 0; i < drives.Length; i++) { comboBox1.Items.Add(drives[i].ToString()); } } } } string ConnStr = ConfigurationSettings.AppSettings["ConnectionString"].ToString(); Functions f = new Functions(); SqlConnection con = new SqlConnection(); string PassQuery, Result; int ResultInt; public Modify_TC() { InitializeComponent();
} private void Modify_TC_Load(object sender, EventArgs e) { ArrayList ARR_LOW = new ArrayList(); ArrayList ARR_MED = new ArrayList(); ArrayList ARR_LONG = new ArrayList(); for (int i = 1; i <= 10; i++) { ARR_LOW.Add(i.ToString()); } for (int i = 1; i <= 20; i++) { ARR_MED.Add(i.ToString()); } for (int i = 1; i <= 50; i++) { ARR_LONG.Add(i.ToString()); } COM_LONG.DataSource = ARR_LONG; COM_MEDIUM.DataSource = ARR_MED; COM_SHORT.DataSource = ARR_LOW; } private void btnSave_Click(object sender, EventArgs e)
{ PassQuery="UPDATE TC_LENGTH SET S="+ COM_SHORT.SelectedItem.ToString() +", M="+ COM_MEDIUM.SelectedItem.ToString() +", L=" + COM_LONG.SelectedItem.ToString() ; ResultInt = f.ExecQry(con, ConnStr, PassQuery); if (ResultInt == 1) MessageBox.Show("Details Updated"); else MessageBox.Show("Details Not Updated"); } string ConnStr = ConfigurationSettings.AppSettings["ConnectionString"].ToString(); Functions f = new Functions(); SqlConnection con; SqlCommand com; SqlDataReader DR; string PassQuery, Result, TotalContent, Ncoded, Dcoded; DataTable DT = new DataTable(); String LoginName; public MyHistory(string lgnName) { InitializeComponent(); LoginName = lgnName; }
private void MyHistory_Load(object sender, EventArgs e) { LBL_NAME.Text = LoginName; con = new SqlConnection(ConnStr); PassQuery = "SelECT * FROM HISTORY WHERE UID='" + LBL_NAME.Text + "'"; DT = f.GetDT(con, ConnStr, PassQuery); if (DT.Rows.Count != 0) { lbl_status.Visible = false; dataGridView1.Visible = true; dataGridView1.DataSource = DT; } else { dataGridView1.Visible = false; lbl_status.Visible = true; lbl_status.Text = "No History Available"; } }
private void OpenNotePad(string F1) { Process notePad = new Process();
notePad.StartInfo.FileName = "notepad.exe"; notePad.StartInfo.Arguments = F1; notePad.Start(); } private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e) { string source = dataGridView1.CurrentRow.Cells[3].Value.ToString(); string Ncode = dataGridView1.CurrentRow.Cells[5].Value.ToString(); if (File.Exists(source)) { OpenNotePad(source); } else { MessageBox.Show("Source File doesnt' Exists."); } if (File.Exists(Ncode)) { OpenNotePad(Ncode); } else { MessageBox.Show("Encoded File doesnt' Exists.");
} } } public N_Code_D_Code(string lgnName ) { LoginName = lgnName; InitializeComponent(); } private void btn_EnCode_Click(object sender, EventArgs e) { Ncoded = f.base64Encode(txt_Code.Text); txt_encoded.Text = Ncoded; } private void txt_Code_TextChanged(object sender, EventArgs e) { TotalContent = txt_Code.Text; StrArray = TotalContent.Split(" ".ToCharArray()); ArrayCnt = StrArray.Length; if (ArrayCnt <= Shrt) lbl_Status.Text = "This Data is considered as Short-Text Categorization"; else if(ArrayCnt<=Medim) lbl_Status.Text = "This Data is considered as Medium-Text Categorization"; else
lbl_Status.Text = "This Data is considered as Long-Text Categorization"; lbl_Status.Visible = true; } private void N_Code_D_Code_Load(object sender, EventArgs e) { try { Login l= new Login(); //lbl_Status.Text = l.txtun.Text; LBL_NAME.Text = LoginName.ToString(); f.GetDirectory(comboBox1); f.GetDirectory(comboBox2); comboBox1.SelectedIndex = 0; comboBox2.SelectedIndex = 0; con = new SqlConnection(ConnStr); com = new SqlCommand("SELECT * FROM TC_LENGTH", con); con.Open(); DR = com.ExecuteReader(); DR.Read(); Shrt = Convert.ToInt32(DR[0]); Medim = Convert.ToInt32(DR[1]); Lng = Convert.ToInt32(DR[2]); }
catch { MessageBox.Show("Error Occured..."); } finally { DR.Close(); con.Close(); con.Dispose(); } } public void GetDirectory() { DirectoryInfo di = new DirectoryInfo(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Loc ation)); if (di != null) { FileInfo[] subFiles = di.GetFiles(); if (subFiles.Length > 0) { Console.WriteLine("Files:"); foreach (FileInfo subFile in subFiles) {
Console.WriteLine(" " + subFile.Name + " (" + subFile.Length + " bytes)"); } } Console.ReadKey(); } } private void btn_NcodeSave_Click(object sender, EventArgs e) { try { if (f.ErrProvider_TextBox(txt_SfileName, errorProvider1, "Give File Name") == true && f.ErrProvider_TextBox(txt_SfileName, errorProvider1, "Give File Name") == true ) { if (comboBox1.SelectedIndex != comboBox2.SelectedIndex) { PassQuery = "INSERT INTO HISTORY VALUES('" + LBL_NAME.Text + "','" + txt_Code.Text.Replace("'", "''") + "', '" + comboBox1.SelectedItem.ToString() + txt_SfileName.Text + ".txt' , '" + txt_encoded.Text.Replace("'", "''") + "', '" + comboBox2.SelectedItem.ToString() + txt_SfileName.Text + ".txt')"; f.ExecQry(con, ConnStr, PassQuery); f.GenerateTXTfile(comboBox1.SelectedItem.ToString(), txt_Code.Text, txt_SfileName.Text); f.GenerateTXTfile(comboBox2.SelectedItem.ToString(), txt_encoded.Text, txt_SfileName.Text); MessageBox.Show("File Generated."); }
else { MessageBox.Show("Give Different File Location"); } } } catch { MessageBox.Show("Error Occured."); } } private void btn_ViewHistory_Click(object sender, EventArgs e) { MyHistory his = new MyHistory(LBL_NAME.Text); his.Show(); } }
APPENDICES REFERENCES BOOKS Professional ASP.NET MVC 1.0 (Wrox Programmer to Programmer) / Rob Conery, Scott Hanselman, Phil Haack, Scott Guthrie Publisher: Wrox ASP.NET 3.5 Unleashed / Stephen Walther Publisher: Sams Programming ASP.NET 3.5 / Jesse Liberty, Dan Maharry, Dan Hurwitz Publisher: O'Reilly Media, Inc. URL www.codeproject.com/KB/custom-controls/asppopup.aspx http://www.developerfusion.com/code/4673/programatically-load-user-controls/ http://www.developerfusion.com/code/4596/how-to-access-a-mysql-database-with-net/ http://www.developerfusion.com/code/3826/adding-controls-to-placeholders-dynamically/ http://aspalliance.com/1125_Dynamically_Templated_GridView_with_Edit_Delete_and_I nsert_Options http://www.15seconds.com/issue/041020.htm http://www.a1vbcode.com/app-3619.asp http://www.aspcode.net/ASPNET-301-redirect.aspx http://www.aspcode.net/Master-pages-in-ASP-free-template-engine.aspx

A Communication Perspective On Automatic Text Categorization

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Communication Perspective On Automatic Text Categorization

Загружено:

Авторское право:

Доступные форматы

A COMMUNICATION PERSPECTIVE ON AUTOMATIC TEXT CATEGORIZATION

CHAPTER - 3 DEVELOPMENT ENVIRONMENT 3.1 HARDWARE CONFIGURATION

3.2 SOFTWARE CONFIGURATION

Here are some simple things that can go wrong in software.

Syntactic Differences Visual Basic is NOT case sensitive

Fill mandatory fields

Fill mandatory fields

LEVEL 3 User User and Admin Registration Login

Fill mandatory fields

User and Admin Login

Admin modify the length User wants to enter the text

LEVEL 4 User Registration

Fill mandatory fields

User and Admin Login

Admin modify the length User wants to enter the text

User gets the encode in text box

Fill mandatory fields

User and Admin Login

Admin modify the length User wants to enter the text

User gets the encode in text box

Save source file and enclosed file in different path

LEVEL 6 User Registration

Fill mandatory fields

User and Admin Login

Admin modify the length User wants to enter the text

User gets the encode in text box

Save source file and enclosed file in different path

Text categorized and user can view the history

5.4 DATABASE DIAGRAM

Chapter 7 SCREEN SHOT 7.1 Login Page

7.2 Registration Page

7.3 Length Page

7.4 N- Code and D Code page

7.5 Save Encoded File

7.6 History page

CHAPTER 8 SYSTEM DIAGRAM 8.1 FLOWCHART

8.2 SEQUENTIAL DIAGRAM

Registration View history Login User

N code and D code the text

Save N code and D code file

Admin Login Modify the length of text

8.4 USE CASE DIAGRAM

public Login() { InitializeComponent(); }

} private void txtun_TextChanged(object sender, EventArgs e) {

} private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e) { Registration r = new Registration();

private void OpenNotePad(string F1) { Process notePad = new Process();

Вам также может понравиться